Do htmlspecialchars and mysql_real_escape_string keep my PHP code safe from injection?
Asked Answered
A

6

116

Earlier today a question was asked regarding input validation strategies in web apps.

The top answer, at time of writing, suggests in PHP just using htmlspecialchars and mysql_real_escape_string.

My question is: Is this always enough? Is there more we should know? Where do these functions break down?

Amerce answered 21/9, 2008 at 8:58 Comment(0)
A
243

When it comes to database queries, always try and use prepared parameterised queries. The mysqli and PDO libraries support this. This is infinitely safer than using escaping functions such as mysql_real_escape_string.

Yes, mysql_real_escape_string is effectively just a string escaping function. It is not a magic bullet. All it will do is escape dangerous characters in order that they can be safe to use in a single query string. However, if you do not sanitise your inputs beforehand, then you will be vulnerable to certain attack vectors.

Imagine the following SQL:

$result = "SELECT fields FROM table WHERE id = ".mysql_real_escape_string($_POST['id']);

You should be able to see that this is vulnerable to exploit.
Imagine the id parameter contained the common attack vector:

1 OR 1=1

There's no risky chars in there to encode, so it will pass straight through the escaping filter. Leaving us:

SELECT fields FROM table WHERE id= 1 OR 1=1

Which is a lovely SQL injection vector and would allow the attacker to return all the rows. Or

1 or is_admin=1 order by id limit 1

which produces

SELECT fields FROM table WHERE id=1 or is_admin=1 order by id limit 1

Which allows the attacker to return the first administrator's details in this completely fictional example.

Whilst these functions are useful, they must be used with care. You need to ensure that all web inputs are validated to some degree. In this case, we see that we can be exploited because we didn't check that a variable we were using as a number, was actually numeric. In PHP you should widely use a set of functions to check that inputs are integers, floats, alphanumeric etc. But when it comes to SQL, heed most the value of the prepared statement. The above code would have been secure if it was a prepared statement as the database functions would have known that 1 OR 1=1 is not a valid literal.

As for htmlspecialchars(). That's a minefield of its own.

There's a real problem in PHP in that it has a whole selection of different html-related escaping functions, and no clear guidance on exactly which functions do what.

Firstly, if you are inside an HTML tag, you are in real trouble. Look at

echo '<img src= "' . htmlspecialchars($_GET['imagesrc']) . '" />';

We're already inside an HTML tag, so we don't need < or > to do anything dangerous. Our attack vector could just be javascript:alert(document.cookie)

Now resultant HTML looks like

<img src= "javascript:alert(document.cookie)" />

The attack gets straight through.

It gets worse. Why? because htmlspecialchars (when called this way) only encodes double quotes and not single. So if we had

echo "<img src= '" . htmlspecialchars($_GET['imagesrc']) . ". />";

Our evil attacker can now inject whole new parameters

pic.png' onclick='location.href=xxx' onmouseover='...

gives us

<img src='pic.png' onclick='location.href=xxx' onmouseover='...' />

In these cases, there is no magic bullet, you just have to santise the input yourself. If you try and filter out bad characters you will surely fail. Take a whitelist approach and only let through the chars which are good. Look at the XSS cheat sheet for examples on how diverse vectors can be

Even if you use htmlspecialchars($string) outside of HTML tags, you are still vulnerable to multi-byte charset attack vectors.

The most effective you can be is to use the a combination of mb_convert_encoding and htmlentities as follows.

$str = mb_convert_encoding($str, 'UTF-8', 'UTF-8');
$str = htmlentities($str, ENT_QUOTES, 'UTF-8');

Even this leaves IE6 vulnerable, because of the way it handles UTF. However, you could fall back to a more limited encoding, such as ISO-8859-1, until IE6 usage drops off.

For a more in-depth study to the multibyte problems, see https://mcmap.net/q/40844/-sql-injection-that-gets-around-mysql_real_escape_string

Amerce answered 21/9, 2008 at 8:58 Comment(17)
The only thing missed here, is that the first example for the DB query ... a simple intval() would solve the injection. Always use intval() in place of mysqlescape...() when needing a number and not a string.Stonecutter
and remember that using parameterized queries will allow you to always have data treated as data and not code. Use a library such as PDO and use parameterised queries whenever possible.Amerce
@The Wicked Flea I would argue type casting with (int) is better. It's faster, and one less parenthesis you need to close on the other side.Ruprecht
+1 : I just had to vote up this 1-year old answer. This even thought me a thing or 2 about XSS.Underneath
Two remarks: 1. In the first example, you'd be safe if you also put quotes around the parameter, like $result = "SELECT fields FROM table WHERE id = '".mysql_real_escape_string($_POST['id'])."'"; 2. In the second case (attribute containing URL), there's no use for htmlspecialchars at all; in these cases, you should encode input using a URL encoding scheme, e.g., using rawurlencode. That way, a user can't insert javascript: et al.Sundstrom
@Marcel 1: kinda true, but adding quotes wouldn't be right, as then you'd have the database comparing a string to a number; much better to restrict the variable to only being an integer. 2: Yes absolutely! It is important to understand the context of the injection location and encode accordingly - Very much the point of this post: Validate as appropriate and encode as appropriate; don't just blindly use some particular function thinking it will keep you safe.Amerce
“htmlspecialchars only encodes double quotes and not single”: that's not true, it depends on flags being set, see its parameters.Sundstrom
Why do you use mb_convert_encoding in combination of htmlentities? Is not sure htmlentities() alone?Tion
@Tion because htmlentities on it's own is vulnerable to multibyte character attacks. For a good ref on avoiding XSS see owasp.org/index.php/…Amerce
Uhm, can you give a simple example? Not sure about what do you mean as multibyte chars :)Tion
@Amerce I would like an example of multibyte charset attack as wellShame
This should be bolded: Take a whitelist approach and only let through the chars which are good. A blacklist will always miss something. +1Passivism
Would you recommend using htmlentities and mb_convert_encoding in conjunction with prepared statements? php.net/manual/en/mysqli.quickstart.prepared-statements.phpLoad
@MartínMolina No! do not attempt to write a one-size-fits-all function. Always deal use a mechanism appropriate for what you are doing. When sending to a database, use bound parameters; when protecting against XSS, use the appropriate output-encoding technique for the context into which you are injecting.Amerce
"Even if you use htmlspecialchars($string) outside of HTML tags, you are still vulnerable to multi-byte charset attack vectors." Could you please explain? this this php piece unsafe? echo "<something>" .htmlspecialchars($untrusted_string). "</something>"Tempest
@HelloWorld For an understanding of the theory behind why this was written, see stackoverflow.com/questions/1412239 However, in the five years since this was written, htmlspecialchars and htmlentities now attempt to detect invalid multi-byte character sequences (although it may be safest to mb_convert_encoding than to rely on the native error); and since php5.4, the system default encoding is UTF-8; and IE6 is deprecated and unsupported. This should mean that in your particular injection context, you should be safe (but may get an unexpected warning if the user sends malformed utf)Amerce
If we pass htmlspecialchars through the entire string though, all those examples will not work. As the comma's will be converted to their special characters. I'd recommend htmlspecialchars($str, ENT_QUOTES, 'UTF-8');Disentwine
D
11

In addition to Cheekysoft's excellent answer:

  • Yes, they will keep you safe, but only if they're used absolutely correctly. Use them incorrectly and you will still be vulnerable, and may have other problems (for example data corruption)
  • Please use parameterised queries instead (as stated above). You can use them through e.g. PDO or via a wrapper like PEAR DB
  • Make sure that magic_quotes_gpc and magic_quotes_runtime are off at all times, and never get accidentally turned on, not even briefly. These are an early and deeply misguided attempt by PHP's developers to prevent security problems (which destroys data)

There isn't really a silver bullet for preventing HTML injection (e.g. cross site scripting), but you may be able to achieve it more easily if you're using a library or templating system for outputting HTML. Read the documentation for that for how to escape things appropriately.

In HTML, things need to be escaped differently depending on context. This is especially true of strings being placed into Javascript.

Diarmid answered 21/9, 2008 at 9:10 Comment(0)
R
4

I would definitely agree with the above posts, but I have one small thing to add in reply to Cheekysoft's answer, specifically:

When it comes to database queries, always try and use prepared parameterised queries. The mysqli and PDO libraries support this. This is infinitely safer than using escaping functions such as mysql_real_escape_string.

Yes, mysql_real_escape_string is effectively just a string escaping function. It is not a magic bullet. All it will do is escape dangerous characters in order that they can be safe to use in a single query string. However, if you do not sanitise your inputs beforehand, then you will be vulnerable to certain attack vectors.

Imagine the following SQL:

$result = "SELECT fields FROM table WHERE id = ".mysql_real_escape_string($_POST['id']);

You should be able to see that this is vulnerable to exploit. Imagine the id parameter contained the common attack vector:

1 OR 1=1

There's no risky chars in there to encode, so it will pass straight through the escaping filter. Leaving us:

SELECT fields FROM table WHERE id = 1 OR 1=1

I coded up a quick little function that I put in my database class that will strip out anything that isnt a number. It uses preg_replace, so there is prob a bit more optimized function, but it works in a pinch...

function Numbers($input) {
  $input = preg_replace("/[^0-9]/","", $input);
  if($input == '') $input = 0;
  return $input;
}

So instead of using

$result = "SELECT fields FROM table WHERE id = ".mysqlrealescapestring("1 OR 1=1");

I would use

$result = "SELECT fields FROM table WHERE id = ".Numbers("1 OR 1=1");

and it would safely run the query

SELECT fields FROM table WHERE id = 111

Sure, that just stopped it from displaying the correct row, but I dont think that is a big issue for whoever is trying to inject sql into your site ;)

Revocable answered 22/9, 2008 at 17:26 Comment(5)
Perfect! This is the exactly kind of sanitisation you need. The initial code failed because it didn't validate that a number was numeric. Your code does this. you should call Numbers() on all integer-use vars whose values originate from outside the codebase.Amerce
It's worth mentioning that intval() will work perfectly fine for this, since PHP automatically coerces integers to strings for you.Retrogress
I prefer intval. It turns 1abc2 to 1, not 12.Oleary
intval is better, expecially on ID. Most of the time, if its been corrupted, its just as is above, 1 or 1=1. You really shouldn't leak other people's ID. So intval will return the correct ID. After that, you should check if the original and cleaned values are the same. Its a great way of not only stopping attacks, but finding the attackers.Vitriform
The incorrect row would be disastrous if you are showing personal data, you would see another user's information! instead it would be better to check return preg_match('/^[0-9]+$/',$input) ? $input : 0;Alternant
G
2

An important piece of this puzzle is contexts. Someone sending "1 OR 1=1" as the ID is not a problem if you quote every argument in your query:

SELECT fields FROM table WHERE id='".mysql_real_escape_string($_GET['id'])."'"

Which results in:

SELECT fields FROM table WHERE id='1 OR 1=1'

which is ineffectual. Since you're escaping the string, the input cannot break out of the string context. I've tested this as far as version 5.0.45 of MySQL, and using a string context for an integer column does not cause any problems.

Gwen answered 22/9, 2008 at 17:38 Comment(4)
and then i'll start my attack vector with the multi-byte char 0xbf27 which in your latin1 database will be converted by the filter fuction as 0xbf5c27 - which is a single multibyte character followed by a single quote.Amerce
Try not to safeguard against a single known attack-vector. You will end up chasing your tail until the end of time applying patch after patch to your code. Standing back and looking at the general cases will leaed to safer code and a better security-focussed mindset.Amerce
I agree; ideally, OP will use prepared statements.Gwen
While the quoting of arguments suggested by this post isn't foolproof it will mitigate many of the common 1 OR 1=1 type attacks so it's worthy of mention.Buatti
L
2
$result = "SELECT fields FROM table WHERE id = ".(INT) $_GET['id'];

Works well, even better on 64 bit systems. Beware of your systems limitations on addressing large numbers though, but for database ids this works great 99% of the time.

You should be using a single function/method for cleaning your values as well. Even if this function is just a wrapper for mysql_real_escape_string(). Why? Because one day when an exploit to your preferred method of cleaning data is found you only have to update it one place, rather than a system-wide find and replace.

Lev answered 4/10, 2011 at 21:32 Comment(0)
W
-3

why, oh WHY, would you not include quotes around user input in your sql statement? seems quite silly not to! including quotes in your sql statement would render "1 or 1=1" a fruitless attempt, no?

so now, you'll say, "what if the user includes a quote (or double quotes) in the input?"

well, easy fix for that: just remove user input'd quotes. eg: input =~ s/'//g;. now, it seems to me anyway, that user input would be secured...

Wershba answered 17/3, 2017 at 19:2 Comment(8)
"why, oh WHY, would you not include quotes around user input in your sql statement?" — The question says nothing about not quoting user input.Suspicion
"well, easy fix for that" — Terrible fix for that. That throws away data. The solution mentioned in the question itself is a better approach.Suspicion
while i agree the question does not address quoting user input, it still seems sill not to quote the input. and, i would rather toss data than input bad data. generally, in an injection attack, you do NOT want that data anyway....right?Wershba
"while i agree the question does not address quoting user input, it still seems sill not to quote the input." — No, it doesn't. The question doesn't demonstrate it one way or the other.Suspicion
can you show an example of how one might be "tossing away data" when one doesn't want the data "tossed away"? I'm just a little confused, and really would like clarification....Wershba
"generally, in an injection attack, you do NOT want that data anyway" — It should be assumed that sometimes people will submit genuine data that is not an attack. That data might include ' characters. A form asking for the visitor's name might be used by Mr. O'Reilly to take a very basic example.Suspicion
so, instead of deleting the quote, couldn't i just convert it to an html element (i don't think element is the word i want). for instance, %27 instead, then? (obviously i'm new to this, and wish to avoid plastering my code with "sql injection checks"Wershba
@JarettL Either get used to using prepared statements or get used to Bobby Tables wrecking your data every Tuesday. Parameterized SQL is the single best way to protect yourself against SQL injection. You don't need to do "SQL injection checks" if you are using a prepared statement. They're extremely easy to implement (and in my opinion, make the code MUCH easier to read), protect from various idiosyncrasies of string concatenation and sql injection, and best of all, you don't have to reinvent the wheel to implement it.Daveen

© 2022 - 2024 — McMap. All rights reserved.