Read ZipFile from URL into StringIO and parse with panda.read_csv
Asked Answered
H

2

5

I'm trying to read ZipFile data from a URL and via StringIO parse the data inside the ZipFile as csv using pandas.read_csv

r = req.get("http://seanlahman.com/files/database/lahman-csv_2014-02-14.zip").content
file = ZipFile(StringIO(r))
salaries_csv = file.open("Salaries.csv")
salaries = pd.read_csv(salaries_csv)

The last line gave me an error:

CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.

However if i try using

salaries = pd.read_csv(file.open("Salaries.csv"))

it works.

So I was wondering what am I missing out here.

file.open should return a ZipExtFile object and since read_csv takes only string or file handle / StringIO input, why is the last line working then?

Haymaker answered 12/8, 2015 at 14:8 Comment(0)
F
3

I think something is wrong with the way you read the data, it works for me using urllib2.

from zipfile import ZipFile
from StringIO import StringIO
import urllib2

r = urllib2.urlopen("http://seanlahman.com/files/database/lahman-csv_2014-02-14.zip").read()
file = ZipFile(StringIO(r))
salaries_csv = file.open("Salaries.csv")
salaries = pd.read_csv(salaries_csv)
       yearID teamID lgID   playerID    salary
0        1985    BAL   AL  murraed02   1472819
1        1985    BAL   AL   lynnfr01   1090000
2        1985    BAL   AL  ripkeca01    800000
3        1985    BAL   AL   lacyle01    725000
4        1985    BAL   AL  flanami01    641667
5        1985    BAL   AL  boddimi01    625000
6        1985    BAL   AL  stewasa01    581250
7        1985    BAL   AL  martide01    560000
Fluorescent answered 12/8, 2015 at 14:24 Comment(0)
A
6

Few changes for Python 3.5 to @firelynx's answer

from zipfile import ZipFile
from io import BytesIO
import urllib.request as urllib2

r = urllib2.urlopen("http://seanlahman.com/files/database/lahman-csv_2014-02-14.zip").read()
file = ZipFile(BytesIO(r))
salaries_csv = file.open("Salaries.csv")
salaries = pd.read_csv(salaries_csv)
print (salaries)
Azar answered 4/6, 2021 at 7:48 Comment(0)
F
3

I think something is wrong with the way you read the data, it works for me using urllib2.

from zipfile import ZipFile
from StringIO import StringIO
import urllib2

r = urllib2.urlopen("http://seanlahman.com/files/database/lahman-csv_2014-02-14.zip").read()
file = ZipFile(StringIO(r))
salaries_csv = file.open("Salaries.csv")
salaries = pd.read_csv(salaries_csv)
       yearID teamID lgID   playerID    salary
0        1985    BAL   AL  murraed02   1472819
1        1985    BAL   AL   lynnfr01   1090000
2        1985    BAL   AL  ripkeca01    800000
3        1985    BAL   AL   lacyle01    725000
4        1985    BAL   AL  flanami01    641667
5        1985    BAL   AL  boddimi01    625000
6        1985    BAL   AL  stewasa01    581250
7        1985    BAL   AL  martide01    560000
Fluorescent answered 12/8, 2015 at 14:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.