Python read csv with Hebrew header
Asked Answered
A

4

8

I tried to use dataset=pandas.read_csv('filename') to make a framework. But somehow I can't do it because one of the column headers is written in Hebrew.

I checked, and it is possible for a DataFrame to have a Hebrew word as column header. dataset.columns = ['שלום', 'b','c','d','e'] but I want to import the data itself from the csv containing the Hebrew word, which I can't.

I get this error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 0: invalid start byte.

How can I import a dataset to datadrame with the column header?

Arianearianie answered 20/11, 2017 at 14:3 Comment(0)
D
9

I used:

dataset = pd.read_csv('file_name.csv', encoding = "ISO-8859-8")

see https://docs.python.org/3/library/codecs.html#standard-encodings for encodings

Duvetyn answered 26/4, 2020 at 15:52 Comment(0)
E
4

Your file is not in utf-8 encoding.

Most likely in ASCII with Hebrew codepage.

0xf9 in Hebrew codepage matches the first (last) character you show in your header example.

You'll have to use the encoding: parameter with the correct codepage.

Ec answered 20/11, 2017 at 14:36 Comment(0)
L
2

As for how to check your encoding, there's a simple trick here, might be of use:

You can just open the file using notepad and then goto File -> Save As. Next to the Save button there will be an encoding drop down and the file's current encoding will be selected there.

Labour answered 19/3, 2019 at 17:37 Comment(0)
M
1

Here is an answer which worked for me:

 import pandas as pd

 f = open('your_file_path', encoding='iso8859-8',errors='replace')
 data = pd.read_csv(f, sep='|')

The sep can be different for your document. The main thing here is to open at first with iso8859-8 encoding, and only after put this object into 'read csv with pandas'.

Marysa answered 13/4, 2021 at 14:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.