How to display Chinese characters inside a pandas dataframe?

Asked 3/9, 2016 at 14:34 Answered 22/1, 2022 at 17:4

python csv pandas encoding chinese-locale

I can read a csv file in which there is a column containing Chinese characters (other columns are English and numbers). However, Chinese characters don't display correctly. see photo below

I loaded the csv file with pd.read_csv().

Either display(data06_16) or data06_16.head() won't display Chinese characters correctly.

I tried to add the following lines into my .bash_profile:

export LC_ALL=zh_CN.UTF-8
export LANG=zh_CN.UTF-8

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

but it doesn't help.

Also I have tried to add encoding arg to pd.read_csv():

pd.read_csv('data.csv', encoding='utf_8')
pd.read_csv('data.csv', encoding='utf_16')
pd.read_csv('data.csv', encoding='utf_32')

These won't work at all.

How can I display the Chinese characters properly?

Bigler answered 3/9, 2016 at 14:34 Comment(2)

Did you try codecs for Chinese languages -- Say encoding='gb2312'? – Euhemerism 3/9, 2016 at 14:51

Thanks. I tried the encoding you suggested, but an error returned: UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 2-3: illegal multibyte sequence – Bigler 3/9, 2016 at 23:32

I just remembered that the source dataset was created using encoding='GBK', so I tried again using

data06_16 = pd.read_csv("../data/stocks1542monthly.csv", encoding="GBK")

Now, I can see all the Chinese characters.

Thanks guys!

Bigler answered 3/9, 2016 at 23:37 Comment(0)

Try this

df = pd.read_csv(path, engine='python', encoding='utf-8-sig')

Yiyid answered 20/3, 2019 at 1:19 Comment(0)

I see here three possible issues:

1) You can try this:

import codecs
x = codecs.open("testdata.csv", "r", "utf-8")

2) Another possibility can be theoretically this:

import pandas as pd
df = pd.DataFrame(pd.read_csv('testdata.csv',encoding='utf-8'))

3) Maybe you should convert you csv file into utf-8 before importing with Python (for example in Notepad++)? It can be a solution for one-time-import, not for automatic process, of course.

Narvaez answered 3/9, 2016 at 18:55 Comment(0)

A non-python relate answer. I just ran into this problem this afternoon and found that using Excel to import data from CSV can show us lots of encoding names. We can play with the encodings there and see which one fit our need. For instance, I found that in excel both gb2312 and gb18030 convert the data nicely from csv to xlsx. But only gb18030 works in Python.

pd.read_csv(in_path + 'XXX.csv', encoding='gb18030')

Anyway, this is not about how to import csv in Python, but rather to find the available encodings to try.

Nashner answered 12/8, 2021 at 13:8 Comment(2)

Hi, may I ask how do you reach this step in excel? thanks. – Zulemazullo 31/8, 2022 at 10:46

Search for import csv to excel with encoding option should yield fruitful results. – Nashner 6/9, 2022 at 0:26

You load a dataset and you have some strange characters. Exemple :

'æˆ´æ£®ç¾Žå�‘é€\xa0åž‹å™¨å®Œæ•´ç‰ˆå¥—è£…Dyson Airwrap HS01ï¼ˆé“œé‡‘è‰²ç¤¼ç›’ç‰ˆï¼‰'

In my case, I know that the strange characters are chineses. So I can figure that the one who send me the data have encode it in utf-8 but should do it in 'ISO-8859-1'.

So first step, I had encoded the string, then I decode with utf-8. so my lines are :

_encoding = 'ISO-8859-1'
_my_str.encode(_encoding, 'ignore').decode("utf-8", 'ignore')

Then my output is :

"'森Dyson Airwrap HS01礼'"

This works for me, but I guess that I do not really well understood under the hood. So feel free to tell me if you have further information.

Bonus. I'll try to detect when the str is in the first strange format because some of my entries are in chinese but others are in english

EDIT : The Bonus is useless. I Just use lamba on ma column to encode and decode without care about format. So I changed the encoding after loading the dataframe

_encoding = 'ISO-8859-1'
_decoding = "utf-8"
df[col] = df[col].apply(lambda x : x.encode(_encoding, 'ignore').decode(_decoding , 'ignore'))

Eustasius answered 22/1, 2022 at 17:4 Comment(0)

Recommended topics

Hot tags