How can I fix "Error tokenizing data" on pandas csv reader?
Asked Answered
J

4

16

I'm trying to read a csv file with pandas.

This file actually has only one row but it causes an error whenever I try to read it.

Something wrong seems happening in line 8 but I could hardly find the 8th line since there's clearly only one row on it.

I do like:

with codecs.open("path_to_file", "rU", "Shift-JIS", "ignore") as file:

df = pd.read_csv(file, header=None, sep="\t")
df

Then I get:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 3

I don't get what's really going on, so any of your advice will be appreciated.

Jamieson answered 12/11, 2018 at 4:45 Comment(0)
A
22

I struggled with this almost a half day , I opened the csv with notepad and noticed that separate is TAB not comma and then tried belo combination.

df = pd.read_csv('C:\\myfile.csv',sep='\t', lineterminator='\r')
Alienor answered 16/6, 2020 at 13:54 Comment(0)
B
6

Try df = pd.read_csv(file, header=None, error_bad_lines=False)

Bust answered 12/11, 2018 at 4:50 Comment(3)
Thanks so much fo your comment Po Xin, I've tried that and got another error like this ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.Jamieson
try this #33999240Bust
How to avoid showing errors in terminal furthermore?Division
U
4

The existing answer will not include these additional lines in your dataframe. If you'd like your dataframe to be as wide as its widest point, you can use the following:

delimiter = ','
max_columns = max(open(path_name, 'r'), key = lambda x: x.count(delimiter)).count(delimiter)
df = pd.read_csv(path_name, header = None, skiprows = 1, names = list(range(0,max_columns)))

Set skiprows = 1 if there's actually a header, you can always retrieve the header column names later. You can also identify rows that have more columns populated than the number of column names in the original header.

Unnatural answered 5/4, 2019 at 18:30 Comment(0)
P
0

A quick a dirty solution that may be helpful to people, you can copy and paste values of your data into a new excel file and save as csv. That can help remove some of those invisible funky characters from files sometimes.

Phlebosclerosis answered 15/2 at 19:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.