Python detection of delimiter/separator in a csv file
Asked Answered
M

2

8

I have a function that reads and handles *.csv files in several dataframes.

However, not all of the .csv files have the same separator. Is there a way to detect which type of separator the .csv file has, and then use it in the Pandas' read_csv() function?

df = pd.read_csv(path, sep = 'xxx',header = None, index_col = 0)
Martyrdom answered 2/11, 2021 at 21:20 Comment(0)
C
15

Update

In fact, use engine='python' as parameter of read_csv. It will try to automatically detect the right delimiter.

sepstr, default ‘,’

Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'.

Use csv.Sniffer:

import csv

def find_delimiter(filename):
    sniffer = csv.Sniffer()
    with open(filename) as fp:
        delimiter = sniffer.sniff(fp.read(5000)).delimiter
    return delimiter

Demo:

>>> find_delimiter('data.csv')
','

>>> find_delimiter('data.txt')
' ' 
Cocci answered 2/11, 2021 at 21:28 Comment(4)
Is there a reason for 5000 value inside fp.read()?Martyrdom
Just a maximum buffer size to avoid reading the whole file. I think 5000 bytes is sufficient to determine the separator.Cocci
Wow! It works !!! I saw other posts where they iterate over a possible list of separators, but it is likely to get an error. However, with your answer, it works really well!!Martyrdom
Setting pandas.read_csv('data.csv',sep=None) detects too the separator automatically and it also uses the python engineSpiller
X
4

As Reda El Hail said in comments before, set the param sep = None, like this:

pandas.read_csv('data.csv',sep=None)

And if you use the lib awswrangler to read csv files in s3, you can do similar to this:

awswrangler.s3.read_csv('s3://bucket/prefix', sep = None)
Xavierxaviera answered 25/5, 2022 at 17:58 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.