Is mysql_real_escape_string vulnerable to invalid UTF-8 exploitation eg overlong UTF-8 or ill formed UTF-8 sequences?
Asked Answered
T

2

12

Assuming I have my database set up as follows to use utf-8 (the full 4mb version in mysql)

mysql_query("SET CHARACTER SET utf8mb4");
mysql_query("SET NAMES utf8mb4");

I am using mysql_real_escape_string to escape unwanted characters before putting a string into sql (note - I am not looking for advice to switch to PDO, I want to establish whether mysql_real_escape_string is safe against overlong utf8 etc).

$input = mysql_real_escape_string($_POST['field']);
$sql = "SELECT * FROM `table` WHERE `header`='$input'";

Is there any validation I need to do to $_POST['field'] (eg. to check if the string is valid UTF-8 and is not overlong and does not contain invalid sequences etc) before doing my mysql_real_escape_string or is that sufficient?

Teth answered 13/1, 2014 at 12:40 Comment(4)
Definitely see https://mcmap.net/q/948099/-shortcomings-of-mysql_real_escape_stringKosiur
Thanks deceze I have upvoted that answer. What do you think about this link: sirdarckcat.blogspot.co.uk/2009/10/… (I'm aware that mysql_real_escape_string isn't used there, but that is where I learned about the potential dangers of overlong UTF-8 and illformed sequences in UTF-8). Is there a resource online which confirms that mysql_real_escape_string handles such UTF-8 cases correctly?Teth
I would think that the escaping happens regardless of what the character represents. E.g. you could insert any random binary data into your query if escaped; so it shouldn't matter what characters exactly are in your string literal. I can't give you a definite thumbs up or thumbs down on this topic though.Kosiur
@Kosiur I agree. I would also think the only character you need to escape is the apostrophe, because its the only character that could possibly break the SQL syntax, right? Or are there other "apostrophes" in the UTF charsets with different code? In that case MySQL shouldn't accept it anyway and should accept only the ASCII apostrophe right? I cannot really imagine what could happen here to break the SQL syntax with other char than the ASCII apostrophe (and the double quote in other cases too, but not in the OP's usage).Sanitarian
C
2

A public service announcement before my answer. You're still using mysql_query. You will, eventually, have to upgrade to mysqli at the very least, even if you don't want to go PDO. All the mysql_ functions are depreciated (see the big red scary box in the previous link) and will likely be removed in PHP 5.6. This is important because the main reason to suggest PDO in your case is prepared statements, which mysqli can also do. A prepared statement is far less vulnerable to injection than escaping, but requires more queries (small performance hit) to do.

As to UTF8, what I would recommend is using mb_check_encoding to ensure the string is at least valid UTF8 before attempting to insert it.

Finally, there's this answer, which offers these words of wisdom

Another way to get yourself into hot water using mysql_real_escape_string is when you set the database connection encoding using the wrong method. You should do this:

mysql_set_charset('utf8', $link);

You can also do this though:

mysql_query("SET NAMES 'utf8'", $link);

The problem is that the latter bypasses the mysql_ API, which still thinks you're talking to the database using latin1 (or something else). When using mysql_real_escape_string now, it will assume the wrong character encoding and escape strings differently than the database will interpret them later. By running the SET NAMES query, you have created a rift between how the mysql_ client API is treating strings and how the database will interpret these strings. This can be used for injection attacks in certain multibyte string situations.

Colly answered 21/1, 2014 at 1:59 Comment(0)
S
1

All the input validation and anti-SQL injection has been subject to soo many misconceptions. In fact, all this boils down to one single thing:

Ensure correct SQL query syntax

If you are able ensure correct SQL syntax for any input data, you are safe and you don't need to read or study anything about validation or sql injection at all. Because all these vulnerabilites are only possible in situations where you allow incorrect SQL syntax.

To ensure correct SQL query syntax in your case, you have to make sure your $input is escaped in the query correct way. Look at the PHP docs: http://php.net/mysql_real_escape_string:

caution Security: the default character set

The character set must be set either at the server level, or with the API function mysql_set_charset() for it to affect mysql_real_escape_string(). See the concepts section on character sets for more information.

So, mysql_real_escape_string has to be informed correctly on your character set to be able to escape properly. So, instead of your mysql_query("SET NAMES utf8mb4");, you should do:

mysql_set_charset("utf8mb4");
Sanitarian answered 15/1, 2014 at 14:32 Comment(2)
Well it's known that if the charset for mysql is only utf8 then invalid or tricky utf-8 sequences can truncate sql queries even when mysql_real_escape_string is used. So I am asking if this behaviour still persists if the utf8mb4 charset is used. And in general asking if mysql_real_escape_string is full proof for utf8mb4... because for utf8 charset it is not, unless you validate the data to be valid 3MB max utf first.Teth
@Hardworker "Well it's known that if the charset for mysql is only utf8 then invalid or tricky utf-8 sequences can truncate sql queries even when mysql_real_escape_string is used." - are you really sure with what you say here?? Can you cite some source? If this were true, it would be a big issue that would cause a lot of attention in PHP world, and almost surely would be mentioned at the PHP doc!! They only warn you to set up correct charset. So I guess they would escape the special utf-8 sequences you speak about.Sanitarian

© 2022 - 2024 — McMap. All rights reserved.