how to read csv files with mbcs codec in Python on Linux?
Asked Answered
F

1

1

I'm trying to read CSV files with Western Europe (windows) encoding

df = pd.read_csv(FileName,encoding='mbcs', usecols=[1],header=4)

This code works well on Windows but not on Linux 18.04. (Error: unknown encoding: mbcs) Indeed, in the codecs python documentation, we have the information:

mbcs is for Windows only: Encode the operand according to the ANSI codepage (CP_ACP).

is there another way/name to decode my files in python on Linux? (I have thousand of files so I can't save as on Excel)

Fermentative answered 28/4, 2020 at 13:48 Comment(0)
M
4

If your systems uses a Western Europe encoding on Windows, the mbcs encoding (the ANSI codepage) is cp1252. So you should use:

df = pd.read_csv(FileName,encoding='cp1252', usecols=[1],header=4)

on both system to have a compatible code base.

Mathew answered 28/4, 2020 at 14:2 Comment(3)
Thank you for your answer, the encoding of my files are ANSI, but with 'cp1252' I have an UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 164956: character maps to <undefined> However it works with 'cp1252' on windowsFermentative
How is represented the b'\x8d byte? Could it be a 'ì' (LATIN SMALL LETTER I WITH GRAVE)Mathew
my files contain 4 lines with some "classic" text. Then bellow this lines (header=4) I have 2 columns Time and Ampl and then only Numerical values, but no 'ì'. The issue was causing by 2-3 files, which seems the same than the others (maybe NaN values). Because it's data I can erase them and it works well, thx!Fermentative

© 2022 - 2024 — McMap. All rights reserved.