Python read csv - BOM embedded into the first key
Asked Answered
G

2

47

I'm using Python 2.7.12. With this code snippet I'm saving a utf-8 csv file. I wrote the BOM (byte order mark) at the beginning of the file.

import codecs
import csv

outputFile = open("test.csv", "wb")
outputFile.write(codecs.BOM_UTF8)
fieldnames = ["a", "b"]
writer = csv.DictWriter(outputFile, fieldnames, delimiter=";")
writer.writeheader()
row = dict([])
for i in range(10):
    row["a"] = str(i).encode("utf-8")
    row["b"] = str(i*2).encode("utf-8")
    writer.writerow(row)
outputFile.close()

I want to load that csv file:

import codecs
import csv
inputFile = open("test.csv", "rb")
reader = csv.DictReader(inputFile, delimiter=";")
for row in reader:
    print row["a"]
inputFile.close()

The above code is going to fail: KeyError: 'a' If I print the row keys this is how they look: [u'\ufeffa', u'b']. The BOM has been embedded into the key a. What am I doing wrong?

Gable answered 28/10, 2016 at 17:12 Comment(0)
G
69

You have to tell open that this is UTF-8 with BOM. I know that works with io.open:

import io

.
.
.
inputFile = io.open("test.csv", "r", encoding='utf-8-sig')
.
.
.

And you have to open the file in text mode, "r" instead of "rb".

Gravettian answered 28/10, 2016 at 17:24 Comment(6)
Actually, I just discovered that your answer work nice only if there aren't special character (à, è, ì, ...), otherwise we'll get the UnicodeEncodeError. Do you know if it's possible to improve your answer?Gable
Oh yes. That is a different issue. csv.Reader doesn't know about UTF-8 https://docs.python.org/2/library/csv.html#csv-examples reader = csv.DictReader((l.encode('utf-8') for l in inputFile), delimiter=";") should do the trick for you: The input-file replaced by a generator das does the encoding.Gravettian
Top!!! Thank you very much!!! :) You made my day with that pythonic line of code :DGable
Didn't work in Python 3.6 when reading with a csv.DictReaderHau
Thank you for this answer! It worked for me with Python 3.7 with a csv.DictReader. I spent hours googling this issue before finding this answer. Wasn't aware there was a BOM encoding option: utf-8-sig. Thanks!Minx
Added bonus is that using utf-8-sig encoding also works for files without the bom, i.e. files that are utf-8 encodedHyperacidity
F
30

In Python 3, the built-in open function is an alias for io.open.

All you need to open a file encoded as UTF-8 with BOM:

open(path, newline='', encoding='utf-8-sig')

Example

import csv

...

with open(path, newline='', encoding='utf-8-sig') as csv_file:
    reader = csv.DictReader(csv_file, dialect='excel')
    for row in reader:
        print(row['first_name'], row['last_name'])
Fetch answered 15/12, 2019 at 0:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.