java.sql.SQLException: Incorrect string value: '\xF0\x9F\x91\xBD\xF0\x9F...'
Asked Answered
I

11

122

I have the following string value: "walmart obama πŸ‘½πŸ’”"

I am using MySQL and Java.

I am getting the following exception: `java.sql.SQLException: Incorrect string value: '\xF0\x9F\x91\xBD\xF0\x9F...'

Here is the variable I am trying to insert into:

var1 varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL`

My Java code that is trying to insert "walmart obama πŸ‘½πŸ’”" is a preparedStatement. So I am using the setString() method.

It looks like the problem is the encoding of the values πŸ‘½πŸ’”. How can I fix this? Previously I was using Derby SQL and the values πŸ‘½πŸ’” just ended up being two sqaures (I think this is the representation of the null character)

All help is greatly appreciated!

Impunity answered 30/11, 2012 at 21:51 Comment(2)
Seems like a duplicate of #10957738 – Acrocarpous
When you create the database, you can give the character set and collation like this: CREATE DATABASE db_name CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; – Nebulose
B
157

What you have is EXTRATERRESTRIAL ALIEN (U+1F47D) and BROKEN HEART (U+1F494) which are not in the basic multilingual plane. They cannot be even represented in java as one char, "πŸ‘½πŸ’”".length() == 4. They are definitely not null characters and one will see squares if you are not using fonts that support them.

MySQL's utf8 only supports basic multilingual plane, and you need to use utf8mb4 instead:

For a supplementary character, utf8 cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8 cannot store the character at all, you do not have any supplementary characters in utf8 columns and you need not worry about converting characters or losing data when upgrading utf8 data from older versions of MySQL.

So to support these characters, your MySQL needs to be 5.5+ and you need to use utf8mb4 everywhere. Connection encoding needs to be utf8mb4, character set needs to be utf8mb4 and collaction needs to be utf8mb4. For java it's still just "utf-8", but MySQL needs a distinction.

I don't know what driver you are using but a driver agnostic way to set connection charset is to send the query:

SET NAMES 'utf8mb4'

Right after making the connection.

See also this for Connector/J:

14.14: How can I use 4-byte UTF8, utf8mb4 with Connector/J?

To use 4-byte UTF8 with Connector/J configure the MySQL server with character_set_server=utf8mb4. Connector/J will then use that setting as long as characterEncoding has not been set in the connection string. This is equivalent to autodetection of the character set.

Adjust your columns and database as well:

var1 varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL

Again, your MySQL version needs to be relatively up-to-date for utf8mb4 support.

Brahmani answered 7/12, 2012 at 0:15 Comment(8)
Check out my other related post: #13748670. If you can answer it, then you will have answered this question as well. The other post has more details of what I have done. – Impunity
@Impunity have you changed everything in your database to utf8mb4, it looks like you are still using utf8_general_ci.. – Brahmani
I have changed everything in my database. I believe the error comes from Java. I'll post the details tomorrow, I am having trouble connection to my database right now... – Impunity
Don't do "SET NAMES" with Connector/J: dev.mysql.com/doc/connector-j/en/… Do not issue the query set names with Connector/J, as the driver will not detect that the character set has changed, and will continue to use the character set detected during the initial connection setup. – Assembled
In case you want to just get rid of the characters from outside the BMP instead of dealing with the mess of changing your DB, see here: #4036062 – Faubert
I have the same problem, followed the steps above but didn't get resolved till changed the character-set-server=utf8mb4 in C:\ProgramData\MySQL\MySQL Server 5.7\my.ini – Aggrieve
I used utf8-mb4 for long text and I was tiring to store 'beta' character, this solution (SET NAMES 'utf8mb4') was useful for me, Thanks – Melodramatize
A strongly suggest to add this addition from the latest version of Connector/J FAQ: For 5.1.47 and later: You can use characterEncoding=UTF-8 to use utf8mb4, even if character_set_server on the server has been set to something else. This helps when you can't or don't want to change character_set_server in your MySQL server instance. – Betweenwhiles
P
21

Weirdly, I found that REMOVING &characterEncoding=UTF-8 from the JDBC url did the trick for me with similar issues.

Based on my properties,

jdbc_url=jdbc:mysql://localhost:3306/dbName?useUnicode=true

I think this supports what @Esailija has said above, i.e. my MySQL, which is indeed 5.5, is figuring out its own favorite flavor of UTF-8 encoding.

(Note, I'm also specifying the InputStream I'm reading from as UTF-8 in the java code, which probably doesn't hurt)...

Pterodactyl answered 10/9, 2013 at 22:12 Comment(1)
Maybe useUnicode=true isn't even needed? In my case the only thing that worked is setting character_set_server=utf8mb4 globally on the server (RDS parameter group) and NOT having any characterEncoding in the JDBC URL. – Acrocarpous
N
17

All in all, to save symbols that require 4 bytes you need to update characher-set and collation for utf8mb4:

  1. database table/column: alter table <some_table> convert to character set utf8mb4 collate utf8mb4_unicode_ci
  2. database server connection (see)

On my development enviromnt for #2 I prefer to set parameters on command line when starting the server: mysqld --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci


btw, pay attention to Connector/J behavior with SET NAMES 'utf8mb4':

Do not issue the query set names with Connector/J, as the driver will not detect that the character set has changed, and will continue to use the character set detected during the initial connection setup.

And avoid setting characterEncoding parameter in connection url as it will override configured server encoding:

To override the automatically detected encoding on the client side, use the characterEncoding property in the URL used to connect to the server.

Nankeen answered 22/12, 2015 at 18:40 Comment(0)
T
7

How I solved my problem.

I had

?useUnicode=true&amp;characterEncoding=UTF-8

In my hibernate jdbc connection url and I changed the string datatype to longtext in database, which was varchar before.

Thracian answered 24/10, 2013 at 14:30 Comment(1)
Greate if you dont need that column indexed and its relatively small, but i can do this trick for all my columns though – Botanist
S
4

Append the line useUnicode=true&amp;characterEncoding=UTF-8 to your jdbc url.

In your case the data is not being send using UTF-8 encoding.

Spermous answered 30/11, 2012 at 21:54 Comment(5)
How do I append this? In my connection string? I am using Netbeans if that helps. – Impunity
How are you creating the connection? – Spermous
DriverManager.getConnection("jdbc:mysql://localhost:####/[dbName]", [user name], [password]); – Impunity
Do it like this - DriverManager.getConnection("jdbc:mysql://localhost:####/[dbName]?useUnicode=true&amp;characterEncoding=UTF-8", [user name], [password]); – Spermous
Scratch that, I forgot the '?' But now I am back to the same error as the original post... – Impunity
S
4

I faced the same issue and solved it by setting the Collation to utf8_general_ci for each column.

Spickandspan answered 26/5, 2015 at 6:58 Comment(0)
C
2

I guess MySQL doesn't believe this to be valid UTF8 text. I tried an insert on a test table with the same column definition (mysql client connection was also UTF8) and although it did the insert, the data I retrieved with the MySQL CLI client as well as JDBC didn't retrieve the values correctly. To be sure UTF8 did work correctly, I inserted an "ΓΆ" instead of an "o" for obama:

johan@maiden:~$ mysql -vvv test < insert.sql 
--------------
insert into utf8_test values(_utf8 "walmart ΓΆbama πŸ‘½πŸ’”")
--------------

Query OK, 1 row affected, 1 warning (0.12 sec)

johan@maiden:~$ file insert.sql 
insert.sql: UTF-8 Unicode text

Small java application to test with:

package test.sql;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;

public class Test
{

    public static void main(String[] args)
    {
        System.out.println("test string=" + "walmart ΓΆbama πŸ‘½πŸ’”");
        String url = "jdbc:mysql://hostname/test?useUnicode=true&characterEncoding=UTF-8";
        try
        {
            Class.forName("com.mysql.jdbc.Driver").newInstance();
            Connection c = DriverManager.getConnection(url, "username", "password");
            PreparedStatement p = c.prepareStatement("select * from utf8_test");
            p.execute();
            ResultSet rs = p.getResultSet();
            while (!rs.isLast())
            {
                rs.next();
                String retrieved = rs.getString(1);
                System.out.println("retrieved=\"" + retrieved + "\"");

            }
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }

}

Output:

johan@appel:~/workspaces/java/javatest/bin$ java test.sql.Test
test string=walmart ΓΆbama πŸ‘½πŸ’”
retrieved="walmart ΓΆbama "

Also, I've tried the same insert with the JDBC connection and it threw the same exception you are getting. I believe this to be a MySQL bug. Maybe there's a bug report about such a situation already..

Cusec answered 6/12, 2012 at 12:34 Comment(1)
By the way, the characters in your string don't even show up correctly in both Firefox and Chrome on OSX. They do show up correctly in my iTerm application. I think this is font dependent. – Cusec
H
1

I had kind of the same problem and after going carefully against all charsets and finding that they were all right, I realized that the bugged property I had in my class was annotated as @Column instead of @JoinColumn (javax.presistence; hibernate) and it was breaking everything up.

Handknit answered 18/2, 2015 at 20:21 Comment(0)
R
1

execute

show VARIABLES like "%char%”;

find character-set-server if is not utf8mb4.

set it in your my.cnf, like

vim /etc/my.cnf

add one line

character_set_server = utf8mb4

at last restart mysql

Ramsdell answered 5/7, 2018 at 5:39 Comment(1)
character_set_server is the option, NOT character-set-server – Priestly
H
0

This setting useOldUTF8Behavior=true worked fine for me. It gave no incorrect string errors but it converted special characters like Γƒ into multiple characters and saved in the database.

To avoid such situations, I removed this property from the JDBC parameter and instead converted the datatype of my column to BLOB. This worked perfect.

Helve answered 28/1, 2019 at 15:27 Comment(1)
Could you please add more deatil to your answer? (code, commants, etc.) – Christiniachristis
L
-4

Besides,data type can use blob install of varchar or text.

Lyssa answered 23/9, 2018 at 16:38 Comment(1)
You do not want that – Tackling

© 2022 - 2024 β€” McMap. All rights reserved.