How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?

Asked 17/11, 2010 at 13:5 Answered 5/9, 2019 at 9:6

I’m playing around with the Stack Overflow API using Python. I’m trying to decode the gzipped responses that the API gives.

import urllib, gzip

url = urllib.urlopen('http://api.stackoverflow.com/1.0/badges/name')
gzip.GzipFile(fileobj=url).read()

According to the urllib2 documentation, urlopen “returns a file-like object”.

However, when I run read() on the GzipFile object I’ve created using it, I get this error:

AttributeError: addinfourl instance has no attribute 'tell'

As far as I can tell, this is coming from the object returned by urlopen.

It doesn’t appear to have seek either, as I get an error when I do this:

url.read()
url.seek(0)

What exactly is this object, and how do I create a functioning GzipFile instance from it?

Mistrust answered 17/11, 2010 at 13:5 Comment(2)

Content-Encoding: gzip should be handled by the http library, but unfortunately it isn't. This is issue 9500 in Python's bug database, for the interested. – Defloration 17/11, 2010 at 14:9

@Magnus: cheers, good to know it’s at least in the bug tracker. – Mistrust 17/11, 2010 at 14:28

The urlopen docs list the supported methods of the object that is returned. I recommend wrapping the object in another class that supports the methods that gzip expects.

Other option: call the read method of the response object and put the result in a StringIO object (which should support all methods that gzip expects). This maybe a little more expensive though.

E.g.

import gzip
import json
import StringIO
import urllib

url = urllib.urlopen('http://api.stackoverflow.com/1.0/badges/name')
url_f = StringIO.StringIO(url.read())
g = gzip.GzipFile(fileobj=url_f)
j = json.load(g)

Swoon answered 17/11, 2010 at 13:14 Comment(4)

Wrapping it in a StringIO object gets past that error, but I still get an IOError: Not a gzipped file – Iceboat 17/11, 2010 at 13:16

@ThomasK It works find for me. Are you passing url.read() to the StringIO constructor or just url? The latter fails. – Tressatressia 17/11, 2010 at 13:21

Excellent, cheers. Unutbu’s answer was great too, but I’ll go with this one as I’m guessing the StringIO solution is more backwards compatible. – Mistrust 17/11, 2010 at 14:49

Is there a way to do this without reading the entire urlopen response in one go? I'm looking to use something like this in a situation where the payload of the urlopen is very large (GBs), so I would like to be able to use this to stream-parse as data comes in, rather than blocking on the whole http request. – Rental 19/10, 2015 at 15:21

import urllib2
import json
import gzip
import io

url='http://api.stackoverflow.com/1.0/badges/name'
page=urllib2.urlopen(url)
gzip_filehandle=gzip.GzipFile(fileobj=io.BytesIO(page.read()))
json_data=json.loads(gzip_filehandle.read())
print(json_data)

io.BytesIO is for Python2.6+. For older versions of Python, you could use cStringIO.StringIO.

Chauvin answered 17/11, 2010 at 13:19 Comment(0)

Here is a new update for @stefanw's answer, to whom that might think it too expensive to use that much memory.

Thanks to this article(https://www.enricozini.org/blog/2011/cazzeggio/python-gzip/, it explains why gzip doesn't work), the solution is to use Python3.

import urllib.request
import gzip

response = urllib.request.urlopen('http://api.stackoverflow.com/1.0/badges/name')
with gzip.GzipFile(fileobj=response) as f:
    for line in f:
        print(line)

Tessie answered 5/9, 2019 at 9:6 Comment(0)

Recommended topics

Hot tags