"set names" vs mysqli_set_charset — besides affecting mysqli_escape_string, are they identical?
Asked Answered
A

4

12

It seems to be common knowledge to use mysql_set_charset / mysqli::set_charset instead of the direct MySQL query set names.

The reason often cited is that set names is insecure because the encoding used for mysql_real_escape_string / mysqli::real_escape_string will only be set by a call to mysql_set_charset / mysqli::set_charset. (Another reason cited is that the PHP docs says it's "not recommended" §.)

However, is it safe to use the direct MySQL query set names if we use prepared statements andor other means of escaping besides mysql_real_escape_string / mysqli::real_escape_string / mysqli_escape_string?

Besides affecting the encoding of mysql_real_escape_string / mysqli::real_escape_string / mysqli_escape_string, Is there any difference between set names vs mysql_set_charset/mysqli::set_charset?

Aglaia answered 27/10, 2014 at 20:30 Comment(0)
A
6

Calling SET NAMES on the connection is equivalent to calling set_charset, provided you call neither get_charset nor mysql_real_escape_string (and friends).


When you call set_charset, PHP does two things. First, it calls SET NAMES on the connection. Second, it remembers what charset you set. That state information is later used only in the get_charset and mysql_real_escape_string (and friends) functions. Therefore, if you don't use these functions, then you may consider the two equivalent.

Let's walk the source:

  1. Userland functions mysql_set_charset and mysqli_set_charset call...
  2. Engine function mysql_set_character_set calls...
  3. Engine macro mysqlnd_set_character_set, which is defined as:

    #define mysqlnd_set_character_set(conn, cs) \ ((conn)->data)->m->set_charset((conn)->data, (cs)))

    and expands to...

  4. MYSQLND_METHOD(mysqlnd_conn_data, set_charset) which contains the following code (numbered for discussion, these are not actual source line numbers):

 1   if (PASS == conn->m->local_tx_start(conn, this_func)) {
 2      char * query;
 3      size_t query_len = mnd_sprintf(&query, 0, "SET NAMES %s", csname);
 4 
 5      if (FAIL == (ret = conn->m->query(conn, query, query_len))) {
 6          php_error_docref(NULL, E_WARNING, "Error executing query");
 7      } else if (conn->error_info->error_no) {
 8          ret = FAIL;
 9      } else {
10           conn->charset = charset;
11      }
12      mnd_sprintf_free(query);
13 
14      conn->m->local_tx_end(conn, this_func, ret);
15   }

As you can see, PHP calls SET NAMES on the connection itself (line 3). PHP also tracks the charset just set (line 10). The comments further discuss what happens with conn->charset, but suffice to say it winds up only being in get_charset and mysql_real_escape_string (and friends).

So, if you don't care about this state, and you agree to use neither get_charset nor mysql_real_escape_string, then you may call SET NAMES on the connection itself with no ill effect.

As an aside, and I've never done this, but it looks like compiling PHP with -DPHP_DEBUG=1 will enable substantial debugging through various DBG macros. That may be useful in seeing how your code is passing through this block.

Acquittal answered 21/6, 2016 at 14:51 Comment(10)
OK, now that PHP knows (conn->charset = charset) that the connection is in a given "charset", what will it later do with that info?Spang
@RickJames Only two uses of the stored charset state: mysqli_get_charset uses it for obvious parity and mysqli_real_escape_string uses it to know how many bytes per character for escaping purposes. Since the OP is managing safe quoting himself, and presumably knows the charset, the state value PHP tracks are irrelevant.Acquittal
Thanks. I think that confirms my assertion that there are two independent things going on -- escaping for building the SQL string, and transcoding the bytes from the client encoding to whatever the column/table's CHARACTER SET for storing.Spang
But I am still puzzled as to why mysqli_real_escape_string would care how many "bytes per character" there are. And what does that mean for variable-length encodings, such as utf8?Spang
@RickJames There are two escapers: one for quotes and another for slashes. The code tells all, but in a nutshell: checking for unicode allows for a copy optimization.Acquittal
This answer on how PDO uses bindParam and friends may also be interesting.Acquittal
It seems to depend on mb_valid and mb_charlen; do they look at the charset?Spang
@RickJames mb_valid and mb_charlen are function pointers to character set specific implementations that validate and return character length, respectively. So these implementations are de facto character set aware.Acquittal
And they are 'aware' only if you run set_charset? And not if you simply do SET NAMES? How will things break of one does only SET NAMES?Spang
@RickJames Q1: Yes. Q2: Yes. Q3: If one does only SET NAMES, then one forfeits the ability to use get_charset and mysql_real_escape_string (and friends). Since the OP stipulated he was doing neither get_charset nor mysql_real_escape_string (and friends), then - in that case and that case alone - calling SET NAMES is equivalent to calling set_charset.Acquittal
S
3

Two things must be done (in this area):

  • Escape quotation marks (and other characters) before putting them inside quotes. Otherwise the quotes would give you syntax errors.
  • Establish the encoding of the bytes in the client. This is so that INSERTs/SELECTs will know how to change the bytes during the write/read.

The first needs to escape apostrophe and double-quote, since both of those are acceptable quote marks for strings in MySQL syntax. Then, the escape character, itself, needs escaping. Those 3 characters are sufficient for must applications. However if you are trying to escape a BLOB (such as a .jpg), various control characters may cause trouble. You are probably better off converting to hex, then using UNHEX(), to avoid issues. Note: Nothing is mentioned here about character sets. If you aren't dealing with BLOBs, you can get away with PHP's addslashes().

The second item's purpose is to say "this stream of bytes is encoded this way (utf8/latin1/etc)". It's only use is for converting between the CHARACTER SET of the column being stored/fetched and the desired encoding in your client (PHP, etc). It is handled in a variety of ways by various languages. For PHP:

  • mysql_* -- Do not use this interface; it is deprecated and will soon be removed.
  • mysqli_* -- mysqli::set_charset(...)
  • PDO -- new PDO('...;charset=UTF8', ...)

Does set_charset() do something with real_escape_string? I do not know. But it should not matter. SET NAMES obviously cannot since it is a MySQL command, and knows nothing about PHP.

htmlentities() is another PHP function in this area. It turns 8-bit codes into & entities. This should not be used going into MySQL. It would only mask other problems. Use it only in certain situations involving HTML, not PHP or MySQL.

The only reasonable CHARACTER SETsto use today are ascii, latin1, utf8, and utf8mb4. Those have no "characters" in the "control" area. Sjis and a few other character sets do. This confusion over control characters may be a reason for real_escape_string existing.

Conclusion:

As I see it, you need two mechanisms: One for escaping, and one for establishing the encoding in the client. They are separate.

If they are tied together, the PHP manual has failed to provide any compelling reason for picking one method over another.

Spang answered 18/6, 2016 at 17:58 Comment(1)
bishop's answer is possibly more complete than mine. His is certainly more 'researched'.Spang
A
1

mysql: the whole interface is deprecated, so don't use any of it at all (PHP 7 removes the interface).

mysqli (and PDO) has prepared statements that make the use of real_escape_string not needed (nor wanted). -> So if you use mysqli and prepared statements only: no worries how you set the charset.

Since you care about security: I see little point in not using prepared statements.

Once you use mysqli's prepared statements the only way forward is to use $mysqli->set_charset() as you can't simply concatenate multiple sql statements in one string anymore.

Hence the question to know the difference is at most academic and not relevant in real life.

In summary:

  • mysql: don't use at all.

  • mysqli: use prepared statements and hence the set_charset() method
    Also: you won't need real_escape_string anymore once you use prepared statements.

  • or -of course- use PDO and it's methods.

Affective answered 20/6, 2016 at 23:57 Comment(6)
It does not address the original question.Husbandman
Prepared statements do not allow you to issue multiple sql statements in one string ... - so the $mysqli->set_charset() is the way to go no matter what.Affective
The question was "Is there any difference between". Not sure how your answer and comment are relevant, sorry. I do not say what you said is wrong (since it is not), I'm just saying it's not what OP asked.Husbandman
Tried to explain in the answer why the difference is irrelevant.Affective
Well, I don't agree that the difference is irrelevant. Irregardless of what one should do, the question why still makes sense.Husbandman
SET NAMES and set_charset tell the server what encoding the bytes are in. "Binding" does not know or care how the strings are encoded. So it is not sufficient.Spang
T
1

SET NAMES ... is a convenience alias:

A SET NAMES 'charset_name' statement is equivalent to these three statements:

SET character_set_client = charset_name;
SET character_set_results = charset_name;
SET character_set_connection = charset_name;

Setting character_set_connection to charset_name also implicitly sets collation_connection to the default collation for charset_name.

... that provides MySQL Server with all the text-encoding information required for current connection. So far so good.

But PHP is also involved and it will not learn anything from here because it's basically a random user query. There are two things that PHP will not do for obvious performance reasons:

  • Scan all user queries sent to the server to detect calls to SET NAMES.
  • Ask MySQL for current values of the involved directives every time something needs to be done.

In short: this method notifies the server but not the client. However, dedicated PHP functions do both things.

Thaler answered 21/6, 2016 at 14:52 Comment(1)
Then comes the question of why PHP needs to know or care about the encoding.Spang

© 2022 - 2024 — McMap. All rights reserved.