Inserting UTF-8 encoded string into UTF-8 encoded mysql table fails with "Incorrect string value"
Asked Answered
S

4

15

Inserting UTF-8 encoded string into UTF-8 encoded table gives incorrect string value.

PDOException: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9D\x84\x8E i...' for column 'body_value' at row 1: INSERT INTO

I have a π„Ž character, in a string that mb_detect_encoding claims is UTF-8 encoded. I try to insert this string into a MySQL table, which is defined as (among other things) DEFAULT CHARSET=utf8

Edit: Drupal always does SET NAMES utf8 with optional COLLATE (atleast when talking to MySQL).

Edit 2: Some more details that appear to be relevant. I grab some text from a PostgreSQL database. I stick it onto an object, use mb_detect_encoding to verify that it's UTF-8, and persist the object to the database, using node_save. So while there is an HTTP request that triggers the import, the data does not come from the browser.

Edit 3: Data is denormalized over two tables:

SELECT character_set_name FROM information_schema.COLUMNS C WHERE table_schema = "[database]" AND table_name IN ("field_data_body", "field_revision_body") AND column_name = "body_value";

>+--------------------+
| character_set_name |
+--------------------+
| utf8               |
| utf8               |
+--------------------+

Edit 4: Is it possible that the character is "to new"? I'm more than a little fuzzy on the relationship between unicode and UTF-8, but this wikipedia article, implies that the character was standardized very recently.

I don't understand how that can fail with "Incorrect string value".

Supernumerary answered 13/8, 2012 at 14:59 Comment(5)
Is the field on that table defined with a UTF-8 charset? – Antoneantonella
possible duplicate of UTF-8 Database Problem – Layette
What does SELECT character_set_name FROM information_schema.`COLUMNS` C WHERE table_schema = "db_name" AND table_name = "table_name" AND column_name = "column_name"; give – Taxation
MySQL's utf8 is only the BMP. Its utf8mb4 corresponds to the outside world's UTF-8 (and includes 4-byte characters). – Marcosmarcotte
The error is caused by trying to cram 4 bytes into a charset that can't handle it (namely MySQL's "utf8"). – Marcosmarcotte
R
26

π„Ž (U+1D10E) is a character Unicode found outside the BMP (Basic Multilingual Plane) (above U+FFFF) and thus can't be represented in UTF-8 in 3 bytes. MySQL charset utf8 only accepts UTF-8 characters if they can be represented in 3 bytes. If you need to store this in MySQL, you'll need to use MySQL charset utf8mb4. You'll need MySQL 5.5.3 or later. You can use ALTER TABLE to change the character set without much problem; since it needs more space to store the characters, a couple issues show up that may require you to reduce string size. See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html .

Renfrow answered 14/8, 2012 at 8:29 Comment(5)
Great, thank you so much. ALTERing the table, and doing "SET NAMES" when doing this particular type of save, solved the problem for me. – Supernumerary
Is there solutions for older versions of MySQL? – Edam
If he can change databases, surely he can upgrade MySQL a lot easier then using Postgresql. – Renfrow
@JeromeJ - 5.5.3 is when utf8mb4 was introduced. Sorry, there is no text way to store 4-byte UTF-8 (non-BMP) characters. – Marcosmarcotte
@RickJames I think I did end up replacing all 4-byte UTF-8 by their HTML character before saving the string. – Edam
D
8

to solve this issue, first you change your database field to utf8m4b charset. For example:

ALTER TABLE `tb_name` CHANGE `field_name` `field_name` VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL; 

then in your db connection, set driver_options for it to utf8mb4. For example, if you use PDO

$db = new PDO('mysql:host=localhost;dbname=testdb;charset=utf8mb4', 'username', 'password');

or in zend framework 1.2

$dbParam = array('host' => 'localhost', 'username' => 'db_user_name',
            'password' => 'password', 'dbname' => 'db_name',
            'driver_options' => array(
                '1002' => "SET NAMES 'utf8mb4'",
                '12'    => 0 //this is not necessary
            )
        );
Deformity answered 29/3, 2016 at 2:54 Comment(3)
What do 1002 and 12 represent here? – Gavelkind
They are options of driver_options in PDO mysql. You can refer here for more detail php.net/manual/en/ref.pdo-mysql.php – Deformity
PDO::MYSQL_ATTR_INIT_COMMAND (1002) and PDO::FETCH_KEY_PAIR (12). Is FETCH_KEY_PAIR necessary here in relation to the question? – Gavelkind
H
4

In your PDO connecton, set the charset.

new PDO('mysql:host=localhost;dbname=the_db;charset=utf8mb4', $user, $password);
Hobbism answered 13/8, 2012 at 15:8 Comment(4)
Does my edit to the question about 'SET NAMES' change this? I can't imagine a framework as well used would get that wrong. – Supernumerary
That's odd, what/where are you inserting? Might try adding utf8 in the header: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> – Hobbism
My <HEAD> contains <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />, but no data comes from the browser. (I updated the question with more data, again) – Supernumerary
No. It must be utf8mb4, not just utf8. (I edited the answer.) – Marcosmarcotte
G
3

I fixed the error: SQLSTATE[HY000]: General error: 1366 Incorrect string value ...... with this method:

I use utf8mb4_unicode_ci for database database Set utf8mb4_unicode_ci for all tables tables

Set longblog datatype for column (not text, longtext.... you need big datatype to store 4 bytes of your content) fields

It is okay now. If you use laravel, continue to edit config/database.php

'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',

laravel

If you use function strtolower, replace it with mb_strtolower Notice: you have to put <meta charset="utf-8"> on your head tag

Guzel answered 28/8, 2019 at 4:22 Comment(0)

© 2022 - 2024 β€” McMap. All rights reserved.