Loading utf-8 encoded text into MySQL table
Asked Answered
G

5

31

I have a large CSV file that I am going to load it into a MySQL table. However, these data are encoded into utf-8 format, because they include some non-english characters. I have already set the character set of the corresponding column in the table to utf-8. But when I load my file. the non-english characters turn into weird characters(when I do a select on my table rows). Do I need to encode my data before I load the into the table? if yes how Can I do this. I am using Python to load the data and using LOAD DATA LOCAL INFILE command. thanks

Gluten answered 10/2, 2011 at 13:51 Comment(1)
Here was my problem. For certain characters such as certain emojis, you need to use CHARACTER SET utf8mb4 as described here: https://mcmap.net/q/65066/-quot-incorrect-string-value-quot-when-trying-to-insert-utf-8-into-mysql-via-jdbcStagger
C
20

as said in http://dev.mysql.com/doc/refman/5.1/en/load-data.html, you can specify the charset used by your CSV file with the "CHARACTER SET" optional parameter of LOAD DATA LOCAL INFILE

Chain answered 10/2, 2011 at 13:55 Comment(1)
It does not say that mysql wrongly uses another charset, regardless what column charset you've set !Pentatomic
U
101

Try

LOAD DATA INFILE 'file'
IGNORE INTO TABLE table
CHARACTER SET UTF8
FIELDS TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
Urbanist answered 18/1, 2013 at 10:56 Comment(6)
adding "CHARACTER SET UTF8" was the key!Oblation
Oh my, took me so long. Tried everything, it just kept converting utf8 to latin and importing it into a utf 8 table. The encoding option worked wonders.Pentatomic
Basically it's an error of mysql up until the latest version, including MariaDb. If a table or column is UTF8 it needs to automatically take the correct values. Well it does not, you need to specify it and hope you've no mixed table.Pentatomic
CHARACTER SET UTF8 Works like magic. tried many things but this was the solutionSlushy
Please tell us where in the documentation it says what the options are for character set? I mean, should we write UTF8 or UTF-8? Quoted or not? Case sensitive?Butterfingers
it works like a charm!! Make this as the best answerDelanty
C
20

as said in http://dev.mysql.com/doc/refman/5.1/en/load-data.html, you can specify the charset used by your CSV file with the "CHARACTER SET" optional parameter of LOAD DATA LOCAL INFILE

Chain answered 10/2, 2011 at 13:55 Comment(1)
It does not say that mysql wrongly uses another charset, regardless what column charset you've set !Pentatomic
T
3

Do not need encode your characters in the file, but you need to make sure that your file is encoding at UTF-8 before load this file to database.

Tantalum answered 8/6, 2014 at 4:7 Comment(0)
P
2

You should send

init_command = 'SET NAMES UTF8'
use_unicode = True
charset = 'utf8'

when doing MySQLdb.connect() e.g.

dbconfig = {}
dbconfig['host']            = 'localhost'
dbconfig['user']            = ''
dbconfig['passwd']          = ''
dbconfig['db']              = ''
dbconfig['init_command']    = 'SET NAMES UTF8'
dbconfig['use_unicode']     = True
dbconfig['charset']         = 'utf8'

conn = MySQLdb.connect(**dbconfig)

edit: ah, sorry, I see you've added that you're using "LOAD DATA LOCAL INFILE" -- this wasn't clear from your initial question :)

Pyrimidine answered 10/2, 2011 at 13:58 Comment(3)
This is programming language specific.Cluff
@Cluff -- yes, it's python as specified in the question... did you downvote for that?!Pyrimidine
...and yes. I tried to undo it but the site sadly prevents me from changing the vote before the answer is edited. So if you edit it in some fashion at some point let me know and then I'll undo it.Cluff
Q
1

Try something like,

LOAD DATA LOCAL INFILE "file" INTO TABLE message_history CHARACTER SET UTF8 COLUMNS TERMINATED BY '|' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"';

Original Structure,

https://dev.mysql.com/doc/refman/8.0/en/load-data.html

Quadric answered 10/7, 2019 at 10:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.