Is strip_tags() vulnerable to scripting attacks?
Asked Answered
R

6

57

Is there a known XSS or other attack that makes it past a

$content = "some HTML code";
$content = strip_tags($content);

echo $content;

?

The manual has a warning:

This function does not modify any attributes on the tags that you allow using allowable_tags, including the style and onmouseover attributes that a mischievous user may abuse when posting text that will be shown to other users.

but that is related to using the allowable_tags parameter only.

With no allowed tags set, is strip_tags() vulnerable to any attack?

Chris Shiflett seems to say it's safe:

Use Mature Solutions

When possible, use mature, existing solutions instead of trying to create your own. Functions like strip_tags() and htmlentities() are good choices.

is this correct? Please if possible, quote sources.

I know about HTML purifier, htmlspecialchars() etc.- I am not looking for the best method to sanitize HTML. I just want to know about this specific issue. This is a theoretical question that came up here.

Reference: strip_tags() implementation in the PHP source code

Robyn answered 26/4, 2011 at 9:40 Comment(11)
Well, no arguments given, it strips out all tags, so I don't see how there could be any exploit. The worst thing to happen would be someone feeding you invalid markup (no closing tags), but this worst case will simply mean that strip_tags strips out a lot more text.Pinckney
@Gordon thanks, but I mean a whole chunk of HTML data only, so no injected file names etc. (which if I understand it right, is what the forum entry is discussing.) as far as I can see, the thread doesn't prove a vulnerability in strip_tags(), but goes on to recommend htmlspecialchars() - which is what I usually do as well, but I want to know whether it's really necessaryRobyn
You can check the way strip_tags works by looking at the implementation.Erleneerlewine
@Erleneerlewine thanks, I'll add that to the questionRobyn
htmlpurifier.org/comparison#striptags is a bit more dismissive. It probably goes without saying, but needs repeating for the newcomers: If striptags is safe depends on the context. If the output ends up in attributes, then no. Only if the stripped content goes into a page body, then it's okay. (And for that it's indeed sufficient.)Pyrargyrite
possible duplicate of PHP: Prevent XSS with strip_tags() ?Ewold
@Ewold thanks for the dupe, but the question isn't answered in that one, either :)Robyn
@Robyn well, what do you expect in an answer? You basically asked a Yes/No question, to which at least I would only be able to reply: "To my knowledge strip_tags is secure." but who am I to claim that I know each and every XSS attack out there. @Erleneerlewine probably answered it best by telling you to look at the implementation and draw conclusions from that.Ewold
@Ewold fair enough. I guess if there were known vulnerabilities, they would be fixed - at least there was one in 2004 that seems to have been fixed right away.Robyn
@Robyn packetstormsecurity.org/search/?q=strip_tagsEwold
Updated link to implementation of strip_tags(): github.com/php/php-src/blob/master/ext/standard/string.c#L4729Erinn
E
54

As its name may suggest, strip_tags should remove all HTML tags. The only way we can proof it is by analyzing the source code. The next analysis applies to a strip_tags('...') call, without a second argument for whitelisted tags.

First at all, some theory about HTML tags: a tag starts with a < followed by non-whitespace characters. If this string starts with a ?, it should not be parsed. If this string starts with a !--, it's considered a comment and the following text should neither be parsed. A comment is terminated with a -->, inside such a comment, characters like < and > are allowed. Attributes can occur in tags, their values may optionally be surrounded by a quote character (' or "). If such a quote exist, it must be closed, otherwise if a > is encountered, the tag is not closed.

The code <a href="example>xxx</a><a href="second">text</a> is interpreted in Firefox as:

<a href="http://example.com%3Exxx%3C/a%3E%3Ca%20href=" second"="">text</a>

The PHP function strip_tags is referenced in line 4036 of ext/standard/string.c. That function calls the internal function php_strip_tags_ex.

Two buffers exist, one for the output, the other for "inside HTML tags". A counter named depth holds the number of open angle brackets (<).
The variable in_q contains the quote character (' or ") if any, and 0 otherwise. The last character is stored in the variable lc.

The functions holds five states, three are mentioned in the description above the function. Based on this information and the function body, the following states can be derived:

  • State 0 is the output state (not in any tag)
  • State 1 means we are inside a normal html tag (the tag buffer contains <)
  • State 2 means we are inside a php tag
  • State 3: we came from the output state and encountered the < and ! characters (the tag buffer contains <!)
  • State 4: inside HTML comment

We need just to be careful that no tag can be inserted. That is, < followed by a non-whitespace character. Line 4326 checks an case with the < character which is described below:

  • If inside quotes (e.g. <a href="inside quotes">), the < character is ignored (removed from the output).
  • If the next character is a whitespace character, < is added to the output buffer.
  • if outside a HTML tag, the state becomes 1 ("inside HTML tag") and the last character lc is set to <
  • Otherwise, if inside the a HTML tag, the counter named depth is incremented and the character ignored.

If > is met while the tag is open (state == 1), in_q becomes 0 ("not in a quote") and state becomes 0 ("not in a tag"). The tag buffer is discarded.

Attribute checks (for characters like ' and ") are done on the tag buffer which is discarded. So the conclusion is:

strip_tags without a tag whitelist is safe for inclusion outside tags, no tag will be allowed.

By "outside tags", I mean not in tags as in <a href="in tag">outside tag</a>. Text may contain < and > though, as in >< a>>. The result is not valid HTML though, <, > and & need still to be escaped, especially the &. That can be done with htmlspecialchars().

The description for strip_tags without an whitelist argument would be:

Makes sure that no HTML tag exist in the returned string.

Erleneerlewine answered 26/4, 2011 at 16:35 Comment(2)
So... tldr; - yes, strip_tags() is safe?Ahriman
@Ahriman Yes, strip_tags is safe when called with only one argument.Arrington
L
11

I cannot predict future exploits, especially since I haven't looked at the PHP source code for this. However, there have been exploits in the past due to browsers accepting seemingly invalid tags (like <s\0cript>). So it's possible that in the future someone might be able to exploit odd browser behavior.

That aside, sending the output directly to the browser as a full block of HTML should never be insecure:

echo '<div>'.strip_tags($foo).'</div>'

However, this is not safe:

echo '<input value="'.strip_tags($foo).'" />';

because one could easily end the quote via " and insert a script handler.

I think it's much safer to always convert stray < into &lt; (and the same with quotes).

Lipson answered 26/4, 2011 at 16:54 Comment(0)
C
7

According to this online tool, this string will be "perfectly" escaped, but the result is another malicious one!

<<a>script>alert('ciao');<</a>/script>

In the string the "real" tags are <a> and </a>, since < and script> alone aren't tags.

I hope I'm wrong or that it's just because of an old version of PHP, but it's better to check in your environment.

Champion answered 2/11, 2017 at 20:53 Comment(4)
var_dump(strip_tags("<<a>script>alert('ciao');<</a>/script>")); => "alert('ciao');" in PHP 7.1.2. It seems to remove everything after an < until an > is encountered.Edenedens
Ooh, that's nasty. Seems to pass all the tests here though: 3v4l.org/BBapp#outputWeltpolitik
Its worth saying that in PHP 8 this now just results in alert('ciao');Nealson
The referenced tool here uses JavaScript to parse the string and is therefore not a good representation of the issue.Serg
E
3

YES, strip_tags() is vulnerable to scripting attacks, right through to (at least) PHP 8. Do not use it to prevent XSS. Instead, you should use filter_input().

The reason that strip_tags() is vulnerable is because it does not run recursively. That is to say, it does not check whether or not valid tags will remain after valid tags have been stripped. For example, the string
<<a>script>alert(XSS);<</a>/script> will strip the <a> tag successfully, yet fail to see this leaves
<script>alert(XSS);</script>.

This can be seen (in a safe environment) here.

Eaglestone answered 27/7, 2021 at 4:43 Comment(2)
but i guess one could still use it to refuse user input completely if it changes after passing through strip_tags, right?Ewan
Its worth saying that in PHP 8 this now just results in alert('XSS');Nealson
N
1

Strip tags is perfectly safe - if all that you are doing is outputting the text to the html body.

It is not necessarily safe to put it into mysql or url attributes.

Neutralize answered 26/4, 2011 at 10:8 Comment(1)
Although this answer is 10 years old, it is worth mentioning that if you stumble across this in 2021 like I did -- this answer is completely untrue, and outright dangerous information.Eaglestone
T
-1

I've just been able to inject script pn PHP 8 through strip_tags() inside a href:

Test using:

<a href="javascript:alert(1)">Click me!</a>

Obviously this requires user interaction, but passes through this function.

Similar to Is strip_tags() vulnerable to scripting attacks? but without additional carets.

Trix answered 4/5, 2023 at 17:18 Comment(1)
already mentioned hereMccreery

© 2022 - 2024 — McMap. All rights reserved.