Google Cloud Sql Second Generation Utf8Mb4 Encoding
Asked Answered
T

6

11

We are using Google Cloud Sql Second Generation with our AppEngine application. However today we've discovered some problem, we cannot insert emoji characters to our database because we cannot change some server flags to utf8mb4 character encoding.

We've changed character_set_server flag to utf8mb4 but it wasnt enough

We have to change: character_set_system character_set_client collaction_connection

flags to utf8mb4 also, but second generation db is not allowing root user to change those flags. What can we do to solve this problem

Does anyone have any idea about that?

Thanks

Thereby answered 10/8, 2016 at 14:54 Comment(6)
actually I can add emoji characters through mysql client but I cannot add it with Google App Engine java sdk my connection string is jdbc:google:mysql://xxxx:europe-west1:livyappdb/xxxxxxThereby
Can you add a small Java repro for the problem? It sounds like the charset conversion is getting messed up somewhere along the way. Where is the text with emoji coming from, from user input?Welborn
You should also check what character set you are using for the column containing the emoji: #1050228Welborn
the column which contains emoji characters in its data, has utf8mb4 encoding. If I write to that column by using a mysql client it works fine, but when I try to write it through app engine sql api then it doesnt accept emoji charactersThereby
Can you add a small repro case to your post that shows what you are interesting and how you're inserting it?Welborn
We may have the same problem. Could you explain why you think character_set_system, character_set_client and collation_connection are necessary to fix it?Despiteful
G
3

You have to set character_set_server to utf8mb4, change the columns you need to utf8mb4 and create a new Cloud SQL 2nd gen instance with the new flag (!!). Basically, setting the flag on an existing instance and just restarting (tested with 5.7) will not be enough (is this a bug? I did not find it in the docs). Any encoding related connection parameters are not needed and should be removed. The collation will be the standard collation for utf8mb4 which is perfect for me (and probably most cases), even without setting anything.

Goldfilled answered 8/12, 2016 at 8:56 Comment(1)
Setting character_set_server to utf8mb4 on 5.7 and restarting the instance worked for me.Tuinenga
D
1

We had the exact same problem. Setting character_set_server to utf8mb4 wasn't enough. We could insert emojis through MySQL Workbench, but not through our application.

In our case, this problem went away after we started a new instance running MySQL 5.7 instead of 5.6. So my hypothesis is that in 5.7, but not in 5.6, changing the character_set_server flag lets Google Cloud SQL change those other flags you mention, or some other relevant setting.

Of course if you are already running 5.7, this does not apply to you.

Despiteful answered 29/8, 2016 at 8:59 Comment(0)
R
0

SHOW CREATE TABLE -- that will probably say that the column(s) are CHARACTER SET utf8. That need to be fixed with

ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_520_ci;
Rouleau answered 13/8, 2016 at 20:41 Comment(0)
B
0

For me, I've found that using the AppEngine Console->SQL and edit the character_set_server to utf8mb4 and restart the DB does work!

Boger answered 19/3, 2017 at 7:36 Comment(1)
Wrong comment sorry about thatEveleen
E
0

I have old java project with second generation database and emoji was working fine, without using anything else in the connection string. Just two things:

  • to set character_set_server flag to utf8mb4,
  • and to create the database with utf8mb4.

(Skip to Finally if you don't want to read it all.) Now I have this problem on python and nothing is working. I have to solve this so I will write what I have found. I have tried to (this below is not working, is just what I have tried):

1 remove the flag , to restart the instance, to add the flag , to restart again

2 I have set ?charset=utf8 in the connection string and the library returned error: Invalid utf8 character string: 'F09F98'

3 I have set ?charset=utf8mb4 and the library wrote the value to the database, but instead of emoji there was ??? . So if the library recognizes utf8mb4, and writes it, then the problem is not in the connection from the library, but in the database.

4 I have run

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

'character_set_client', 'utf8'
'character_set_connection', 'utf8'
'character_set_database', 'utf8mb4'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8'
'character_set_server', 'utf8mb4'  -> this is set from the Google Console
'character_set_system', 'utf8'
'collation_connection', 'utf8_general_ci'
'collation_database', 'utf8mb4_general_ci'
'collation_server', 'utf8mb4_general_ci'

UPDATE comment set body="😎" where id=1;
Invalid utf8 character string: '\xF0\x9F\x98\x8E'   0,045 sec

SET NAMES utf8mb4;
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
'character_set_client', 'utf8mb4'
'character_set_connection', 'utf8mb4'
'character_set_database', 'utf8mb4'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8mb4'
'character_set_server', 'utf8mb4'
'character_set_system', 'utf8'
'collation_connection', 'utf8mb4_general_ci'
'collation_database', 'utf8mb4_general_ci'
'collation_server', 'utf8mb4_general_ci'

UPDATE comment set body="😎" where id=1;

SUCCESS

So the problem is in one of those flags.

5 I closed the current connection and reopened again my client so that I have these variables set again to utf8. First I changed the character_set_results and the character_set_client so that I can see the correct result in my client (MysqlWorkbench). I have run the update statement again without success and still ??? in the field. After changing the character_set_connection to utf8mb4 and updating the field again , this time I had emoji in the table. But why character_set_connection. As the tests above show , the connection from the library is already utf8mb4. So at this point I don't understand where to set my connection charset to be utf8mb4 so that the things can start to work.

6 I have tried to create new Cloud SQL instance with the charset flag, and created database with utf8mb4, and table with utf8mb4 (although the tables are created with the default database charset), and the insert statement didn't work again. So the only thing that I can think of is, that the charset=utf8mb4 is not working in the connection string. But it wasn't that. I have tried to remove the charset in the connection string and again the same error as before, when using only utf8 charset in the connectio string

So what is left, I don't know.

7 I have tried to use instance with HDD , not SSD.

8 Tried to connect via Google Cloud shell and to insert row via their console.

ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x8E' for column 'body' at row 1

Interesting that the cloud shell shows even in the 'show create table' that the default encoding for this table is utf8mb4. So the cloud shell, (Light Bulb) just like mysqlworkbench is connecting with utf8 as default

Finally

The things worked using db.session.execute("SET NAMES 'utf8mb4'") before inserting to the database (in python), (and using ?charset=utf8mb4 only locally). The real problem when testing something like this could be what method you use to check the result in the database. MySQL Workbench was connecting always with the utf8 encoding as default (you can check this using the 'SHOW ...' command above). So first thing to do is to switch the connection in MySQL Workbench (or your client) using SET NAMES 'utf8mb4'. The tests above show ,that Google cloud shell was connected with utf8 by default as well. I searched the internet and found that they cannot use utf8mb4 as default because they wait the utf8mb4 to be the new standard connection in mysql, and becoming such would be named 'utf8'. Also there is no way to make MySQL Workbench to run with utf8mb4 automatically after connection. You should do this by yourself. Whether or not the problem can occur when reading from the database? I'm about to test this now.

Eveleen answered 10/8, 2017 at 8:39 Comment(1)
Yes, the project works with this db.session.execute("SET NAMES 'utf8mb4'") before inserting any problematic character. Again, be careful how you check what is written inside the database. All clients that I used, even those working in the terminal, used utf8 by default.Eveleen
T
0

Add this in settings.py,

'OPTIONS': {'charset': 'utf8mb4'}

Very thanks to : Unable to use utf8mb4 character set with CloudSQL on AppEngine Python

Trichloroethylene answered 12/8, 2019 at 6:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.