I have old java project with second generation database and emoji was working fine, without using anything else in the connection string. Just two things:
- to set character_set_server flag to utf8mb4,
- and to create the database with utf8mb4.
(Skip to Finally if you don't want to read it all.) Now I have this problem on python and nothing is working. I have to solve this so I will write what I have found.
I have tried to (this below is not working, is just what I have tried):
1 remove the flag , to restart the instance, to add the flag , to restart again
2 I have set ?charset=utf8 in the connection string and the library returned error: Invalid utf8 character string: 'F09F98'
3 I have set ?charset=utf8mb4 and the library wrote the value to the database, but instead of emoji there was ??? . So if the library recognizes utf8mb4, and writes it, then the problem is not in the connection from the library, but in the database.
4 I have run
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
'character_set_client', 'utf8'
'character_set_connection', 'utf8'
'character_set_database', 'utf8mb4'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8'
'character_set_server', 'utf8mb4' -> this is set from the Google Console
'character_set_system', 'utf8'
'collation_connection', 'utf8_general_ci'
'collation_database', 'utf8mb4_general_ci'
'collation_server', 'utf8mb4_general_ci'
UPDATE comment set body="😎" where id=1;
Invalid utf8 character string: '\xF0\x9F\x98\x8E' 0,045 sec
SET NAMES utf8mb4;
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
'character_set_client', 'utf8mb4'
'character_set_connection', 'utf8mb4'
'character_set_database', 'utf8mb4'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8mb4'
'character_set_server', 'utf8mb4'
'character_set_system', 'utf8'
'collation_connection', 'utf8mb4_general_ci'
'collation_database', 'utf8mb4_general_ci'
'collation_server', 'utf8mb4_general_ci'
UPDATE comment set body="😎" where id=1;
SUCCESS
So the problem is in one of those flags.
5 I closed the current connection and reopened again my client so that I have these variables set again to utf8. First I changed the character_set_results and the character_set_client so that I can see the correct result in my client (MysqlWorkbench). I have run the update statement again without success and still ??? in the field. After changing the character_set_connection to utf8mb4 and updating the field again , this time I had emoji in the table. But why character_set_connection. As the tests above show , the connection from the library is already utf8mb4. So at this point I don't understand where to set my connection charset to be utf8mb4 so that the things can start to work.
6 I have tried to create new Cloud SQL instance with the charset flag, and created database with utf8mb4, and table with utf8mb4 (although the tables are created with the default database charset), and the insert statement didn't work again. So the only thing that I can think of is, that the charset=utf8mb4 is not working in the connection string. But it wasn't that. I have tried to remove the charset in the connection string and again the same error as before, when using only utf8 charset in the connectio string
So what is left, I don't know.
7 I have tried to use instance with HDD , not SSD.
8 Tried to connect via Google Cloud shell and to insert row via their console.
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x8E' for column 'body' at row 1
Interesting that the cloud shell shows even in the 'show create table' that the default encoding for this table is utf8mb4. So the cloud shell, (Light Bulb) just like mysqlworkbench is connecting with utf8 as default
Finally
The things worked using db.session.execute("SET NAMES 'utf8mb4'") before inserting to the database (in python), (and using ?charset=utf8mb4 only locally). The real problem when testing something like this could be what method you use to check the result in the database. MySQL Workbench was connecting always with the utf8 encoding as default (you can check this using the 'SHOW ...' command above). So first thing to do is to switch the connection in MySQL Workbench (or your client) using SET NAMES 'utf8mb4'. The tests above show ,that Google cloud shell was connected with utf8 by default as well. I searched the internet and found that they cannot use utf8mb4 as default because they wait the utf8mb4 to be the new standard connection in mysql, and becoming such would be named 'utf8'. Also there is no way to make MySQL Workbench to run with utf8mb4 automatically after connection. You should do this by yourself.
Whether or not the problem can occur when reading from the database? I'm about to test this now.