MySQL utf8mb4, Errors when saving Emojis
Asked Answered
S

9

95

I try to save names from users from a service in my MySQL database. Those names can contain emojis like πŸ™ˆπŸ˜‚πŸ˜±πŸ° (just for examples)

After searching a little bit I found this stackoverflow linking to this tutorial. I followed the steps and it looks like everything is configured properly.

I have a Database (charset and collation set to utf8mb4 (_unicode_ci)), a Table called TestTable, also configured this way, as well as a "Text" column, configured this way (VARCHAR(191) utf8mb4_unicode_ci).

When I try to save emojis I get an error:

Example of error for shortcake (🍰):
    Warning: #1300 Invalid utf8 character string: 'F09F8D'
    Warning: #1366 Incorrect string value: '\xF0\x9F\x8D\xB0' for column 'Text' at row 1

The only Emoji that I was able to save properly was the sun β˜€οΈ

Though I didn't try all of them to be honest.

Is there something I'm missing in the configuration?

Please note: All tests of saving didn't involve a client side. I use phpmyadmin to manually change the values and save the data. So the proper configuration of the client side is something that I will take care of after the server properly saves emojis.

Another Sidenote: Currently, when saving emojis I either get the error like above, or get no error and the data of Username 🍰 will be stored as Username ????. Error or no error depends on the way I save. When creating/saving via SQL Statement I save with question marks, when editing inline I save with question marks, when editing using the edit button I get the error.

thank you

EDIT 1: Alright so I think I found out the problem, but not the solution. It looks like the Database specific variables didn't change properly.

When I'm logged in as root on my server and read out the variables (global):
Query used: SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)

For my Database (in phpmyadmin, the same query) it looks like the following:

+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8               |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8               |
| character_set_server     | utf8               |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+

How can I adjust these settings on the specific database? Also even though I have the first shown settings as default, when creating a new database I get the second one as settings.

Edit 2:

Here is my my.cnf file:

[client]
port=3306
socket=/var/run/mysqld/mysqld.sock
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld_safe]
socket=/var/run/mysqld/mysqld.sock

[mysqld]
user=mysql
pid-file=/var/run/mysqld/mysqld.pid
socket=/var/run/mysqld/mysqld.sock
port=3306
basedir=/usr
datadir=/var/lib/mysql
tmpdir=/tmp
lc-messages-dir=/usr/share/mysql
log_error=/var/log/mysql/error.log
max_connections=200
max_user_connections=30
wait_timeout=30
interactive_timeout=50
long_query_time=5
innodb_file_per_table
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

!includedir /etc/mysql/conf.d/
Surgeonfish answered 1/2, 2016 at 8:27 Comment(12)
it's a phpmyadmin problem, try other mysql client. – Ephraimite
I don't think it's a phpmyadmin problem. As you can see in Edit1 I think it's some misconfiguration between conf/default variables/parameters and those on the database. Even though when creating a new database. – Surgeonfish
What is $cfg["DefaultCharset"] in your PMA configuration? – Ultrasound
I didn't find $cfg["DefaultCharset"]. I looked it up in etc/phpmyadmin/config.inc.php. Not in there. – Surgeonfish
@jsxqf Hi there, after a while and redoing the whole "tutorial" I realized that it acutally was a mysql problem. The session variables were different than the global variables. A new connection, which is happening when using my api, uses the global variables and works :). So actually, if you provide a full answer, I'll accept it and you'll get the bounty. Also, on top of that, I'd appreciate if you also could state how I can reset phpmyadmins session variables. I didn't get this to work. They still are set wrong. – Surgeonfish
Did you try setting $cfg["DefaultCharset"] to a reasonable value like utf8mb4 in your config? – Ultrasound
Hello. I am having the same problem. What was in the end the thing that made it works for you? – Empty
Hi @johnnyfittizio, I'm currently rerunning through this problem xD. But as far as I remember, the solution was at one point the right configuration in the database itself, and then the right configuration of the client. All in all, running exactly, step by step, through the tutorial I linked in the beginning solved my issue. mathiasbynens.be/notes/mysql-utf8mb4#utf8-to-utf8mb4 – Surgeonfish
Thanks for replying. I have already tried that :) Anyway i will look it again. Let's see.. – Empty
The always helpful internet says PMA has a setting $cfg["DefaultCharset"] but I don't see it in the docs. – Cage
There's $cfg['DefaultConnectionCollation'], however in current versions of PMA (v4.7). Defaults to utf8mb4_unicode_ci – Cage
It took me a long time to find this trick: https://mcmap.net/q/225155/-mysqldump-with-utf8-can-not-export-the-right-emojis-string – Treatment
C
118

character_set_client, _connection, and _results must all be utf8mb4 for that shortcake to be eatable.

Something, somewhere, is setting a subset of those individually. Rummage through my.cnf and phpmyadmin's settings -- something is not setting all three.

If SET NAMES utf8mb4 is executed, all three set correctly.

The sun shone because it is only 3-bytes - E2 98 80; utf8 is sufficient for 3-byte utf8 encodings of Unicode characters.

Codger answered 3/2, 2016 at 23:18 Comment(7)
Alright, I think this gets me closer. Thank you. I edited my question and added the my.cnf. Maybe you can see something in there? – Surgeonfish
The connection needs to have utf8mb4. If you can't find where to do that, then execute SET NAMES utf8mb4. – Codger
This is a nice explanation on what also went wrong. But in addition I had to check the session and the global variables. Realizing PHPMyadmins Session variables were still wrong and the error was occuring for the admin board only. – Surgeonfish
Thank you. mysql_query("SET NAMES 'utf8mb4'"); that's right ;) – Ruthieruthless
Oh, I missed one -- A shortcake is big enough for 4 bytes. – Codger
@luky - Start a new Question; supply details. Review this to see if it provides clues. – Codger
SET NAMES utf8mb4 before the INSERT statement did the trick for me. Thank you all! – Amos
S
11

For me, it turned out that the problem lied in mysql client.

mysql client updates my.cnf's char setting on a server, and resulted in unintended character setting.

So, What I needed to do is just to add character-set-client-handshake = FALSE. It disables client setting from disturbing my char setting.

my.cnf would be like this.

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
...

Hope it helps.

Straightway answered 5/9, 2016 at 7:34 Comment(0)
A
8

It is likely that your service/application is connecting with "utf8" instead of "utf8mb4" for the client character set. That's up to the client application.

For a PHP application see http://php.net/manual/en/function.mysql-set-charset.php or http://php.net/manual/en/mysqli.set-charset.php

For a Python application see https://github.com/PyMySQL/PyMySQL#example or http://docs.sqlalchemy.org/en/latest/dialects/mysql.html#mysql-unicode

Also, check that your columns really are utf8mb4. One direct way is like this:

mysql> SELECT character_set_name FROM information_schema.`COLUMNS`  WHERE table_name = "user"   AND column_name = "displayname";
+--------------------+
| character_set_name |
+--------------------+
| utf8mb4            |
+--------------------+
1 row in set (0.00 sec)
Artistry answered 9/2, 2016 at 23:17 Comment(0)
O
1

Symfony 5 answer

Although this is not what was asked, people can land up here after searching the web for the same problem in Symfony.

1. Configure MySQL properly

☝️ See (and upvote if helpful) top answers here.

2. Change your Doctrine configuration

/config/packages/doctrine.yaml

doctrine:
    dbal:
        ...
        charset: utf8mb4
Oldham answered 24/1, 2021 at 9:53 Comment(0)
H
0

I'm not proud of this answer, because it uses brute-force to clean the input. It's brutal, but it works

function cleanWord($string, $debug = false) {
    $new_string = "";

    for ($i=0;$i<strlen($string);$i++) {
        $letter = substr($string, $i, 1);
        if ($debug) {
            echo "Letter: " . $letter . "<BR>";
            echo "Code: " . ord($letter) . "<BR><BR>";
        }
        $blnSkip = false;
        if (ord($letter)=="146") {
            $letter = "&acute;";
            $blnSkip = true;
        }
        if (ord($letter)=="233") {
            $letter = "&eacute;";
            $blnSkip = true;
        }
        if (ord($letter)=="147" || ord($letter)=="148") {
            $letter = "&quot;";
            $blnSkip = true;
        }
        if (ord($letter)=="151") {
            $letter = "&#8211;";
            $blnSkip = true;
        }
        if ($blnSkip) {
            $new_string .= $letter;
            break;
        }

        if (ord($letter) > 127) {
            $letter = "&#0" . ord($letter) . ";";
        }

        $new_string .= $letter;
    }
    if ($new_string!="") {
        $string = $new_string;
    }
    //optional
    $string = str_replace("\r\n", "<BR>", $string);

    return $string;
}

//clean up the input
$message = cleanWord($message);

//now you can insert it as part of SQL statement
$sql = "INSERT INTO tbl_message (`message`)
VALUES ('" . addslashes($message) . "')";
Housemaster answered 20/12, 2018 at 18:43 Comment(0)
T
0

ALTER TABLE table_name CHANGE column_name column_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL;

example query :

ALTER TABLE `reactions` CHANGE `emoji` `emoji` VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL;

enter image description here

after that , successful able to store emoji in table :

enter image description here

Tympanist answered 6/7, 2019 at 13:58 Comment(0)
C
0

Consider adding

init_connect = 'SET NAMES utf8mb4'

to all of your your db-servers' my.cnf-s.

(still, clients can (so will) overrule it)

Casanova answered 31/1, 2020 at 12:10 Comment(0)
B
0

I was importing data via command:

LOAD DATA LOCAL INFILE 'abc.csv' INTO TABLE abc
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"' 
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(col1, col2, col3, col4, col5...);

This didnt work for me:

SET NAMES utf8mb4;

I had to add the CHARACTER SET to make it working:

LOAD DATA LOCAL INFILE
'E:\\wamp\\tmp\\customer.csv' INTO TABLE `customer`
CHARACTER SET 'utf8mb4'
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;

Note, the target column must be also utf8mb4 not utf8, or the import will save (without errors thought) the question marks like "?????".

Bluestocking answered 29/6, 2021 at 8:29 Comment(0)
B
0

For codeigniter user, ensure your character set and collate setting in database.php is set properly, which is worked for me.

$db['default']['char_set'] = 'utf8mb4';
$db['default']['dbcollat'] = 'utf8mb4_unicode_ci';
Beilul answered 25/10, 2022 at 4:55 Comment(0)

© 2022 - 2025 β€” McMap. All rights reserved.