Whether to use "SET NAMES"
Asked Answered
F

3

65

In reading "High performance MySQL" from O'Reilly I've stumbled upon the following

Another common garbage query is SET NAMES UTF8, which is the wrong way to do things anyway (it does not change the client library's character set; it affects only the server).

I'm a bit confused, because I used to put "SET NAMES utf8" on the top of every script to let the db know that my queries are utf8 encoded.

Can anyone comment the above quote, or, to put it more formally, what are your suggestions / best practices to ensure that my database workflow is unicode-aware.

My target languages are php and python if this is relevant.

Flea answered 30/10, 2009 at 15:41 Comment(1)
what technique did you end up implementing?Incorporate
D
33

mysql_set_charset() would be an option - but an option limited to the ext/mysql. For ext/mysqli it is mysqli_set_charset and for PDO::mysql you need to specify a connection parameter.

As using this function results in a MySQL API call, it should be considered much faster than issuing a query.

In respect of performance the fastest way to ensure a UTF-8-based communiction between your script and the MySQL server is setting up the MySQL server correctly. As SET NAMES x is equivalent to

SET character_set_client = x;
SET character_set_results = x;
SET character_set_connection = x;

whereas SET character_set_connection = x internally also executes SET collation_connection = <<default_collation_of_character_set_x>> you can also set these server variables statically in your my.ini/cnf.

Please be aware of possible problems with other applications running on the same MySQL server instance and requiring some other character set.

Deputation answered 30/10, 2009 at 16:18 Comment(3)
I mentioned mysql_set_charset() - that's a function included in the old ext/mysql. As stated above, neither PDO nor ext/mysqli provide any support for this operation directly.Deputation
Seems the link I posted isn't reliable. Here's a better one: php.net/manual/en/mysqli.set-charset.php Not sure how you mean mysqli doesn't support this operation.Serinaserine
Ah OK - sorry... Didn't realize that there is a mysqli_set_charset function available. Thanks for the clarification.Deputation
D
29

TLDR

// The key is the "charset=utf8" part.
$dsn = 'mysql:host=localhost;dbname=testdb;charset=utf8';
$dbh = new PDO($dsn, 'user', 'pass');

This answer has an emphasis on php's pdo library because it's so ubiquitous.

A brief reminder - mysql is a client-server architecture. This is significant because there's not only the mysql server where the actual database is, but there's also the seperate mysql client driver, which is the thing that talks to the mysql server(they're separate entities). You could kinda sorta say the mysql client and pdo are mixed together.

When you use set names utf8, you issue a standard sql query to mysql. While the sql query does pass through pdo, and then through the mysql client library, and then finally it reaches the mysql server, ONLY the mysql server parses and interprets that sql query. This is significant because the mysql server doesn't send any message back to pdo or the mysql client letting it know the character set and encoding has changed, and so the mysql client and pdo are both totally ignorant to fact that it happened.

It's important not to do this because the client library cannot properly handle strings if it isn't aware of the current character set. Most common operations will work correctly without the client knowing the correct character set, but one that won't is string escaping, such as PDO::quote. You may think you don't need to worry about such manual primitive string escaping because you use prepared statements, but the truth is the vast majority of pdo:mysql users unknowingly use emulated prepared statements because it's been the default setting for the pdo:mysql driver for a very long time now. An emulated prepared statement doesn't use real native mysql prepared statements as provided by the mysql api; instead, php does the equivalent of calling PDO::quote() on all your values, and str_replacing'ing all your placeholders with the quoted values for you.

Since you can't properly escape a string unless you know the character set you're using, these emulated prepared statements are vulnerable to sql injection if you've changed to certain character sets via set names. Regardless of the possibility of sql injection, you can still break your strings if you use an escaping scheme intended for a different character set.

For the pdo mysql driver, you can specify the character set when you connect, by specifying it in the DSN. The client library and the server will both be aware of the character set if you do this, and so things will work like they should.

// The key is the "charset=utf8" part.
$dsn = 'mysql:host=localhost;dbname=testdb;charset=utf8';
$dbh = new PDO($dsn, 'user', 'pass');

But improper string escaping isn't the only problem. For example, you can also have problems with using PDO::bindColumn because column names are specified as strings, and so again the encoding matters. An example could be a column name named ütube(note the umlaut), and you switch from latin to utf8 via set names, and then you try to $stmt->bindColumn('ütube', $var); with ütube being a utf8 encoded string because your php file is utf8 encoded. It won't work, you would need to encode the string as a latin1 variant... and now you have all kinds of crazy going on.

Dependency answered 3/1, 2013 at 1:49 Comment(1)
As nowadays (September 2014) PDO is the newest and most robust of the ways to connect PHP with a database, I think this answer is the one should be taken as accepted.Oscine
C
9

Not sure about py, but php has mysql_set_charset now, which states that this is the "preferred way to change the charset [and] using mysql_query() to execute SET NAMES is not recommended." Note, that this function was introduced for MySQL 5.0.7, so it won't work with earlier versions.

mysql_set_charset('utf8', $link);

Where $link is a connection created with mysql_connect

Chloroprene answered 30/10, 2009 at 15:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.