What is the best way to decompress a gzip'ed server response in Python 3?
Asked Answered
V

3

8

I had expected this to work:

>>> import urllib.request as r
>>> import zlib
>>> r.urlopen( r.Request("http://google.com/search?q=foo", headers={"User-Agent": "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11", "Accept-Encoding": "gzip"}) ).read()
b'af0\r\n\x1f\x8b\x08...(long binary string)'
>>> zlib.decompress(_)
Traceback (most recent call last):
  File "<pyshell#87>", line 1, in <module>
    zlib.decompress(x)
zlib.error: Error -3 while decompressing data: incorrect header check

But it doesn't. Dive Into Python uses StringIO in this example, but that seems to be missing from Python 3. What's the right way of doing it?

Vorfeld answered 6/4, 2009 at 4:24 Comment(0)
G
19

It works fine with gzip (gzip and zlib are the same compression but with different headers/"wrapping". Your error has this information in the message).

import gzip
import urllib.request

request = urllib.request.Request(
    "http://google.com/search?q=foo",
    headers={
        "Accept-Encoding": "gzip",
        "User-Agent": "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11", 
    })
response = urllib.request.urlopen(request)
gzipFile = gzip.GzipFile(fileobj=response)
gzipFile.read()
Glycosuria answered 6/4, 2009 at 4:24 Comment(0)
B
7

For anyone using Python 3.2 or later, there is an even simpler way to decompress a response than any of the answers here:

import gzip
import urllib.request

request = urllib.request.Request(
    "http://example.com/",
    headers={"Accept-Encoding": "gzip"})
response = urllib.request.urlopen(request)
result = gzip.decompress(response.read())
Billetdoux answered 6/4, 2009 at 4:24 Comment(1)
You should check if the response actually is encoded with gzip via if response.getheader("Content-Encoding") == "gzip".Pecan
N
5

In Python 3, StringIO is a class in the io module.

So for the example you linked to, if you change:

import StringIO
compressedstream = StringIO.StringIO(compresseddata)

to:

import io
compressedstream = io.StringIO(compresseddata)

it ought to work.

Numbers answered 6/4, 2009 at 4:50 Comment(1)
However for the answer, BytesIO is what's needed by the fileobj argument of gzip.GzipFile under Python 3.Preconceive

© 2022 - 2024 — McMap. All rights reserved.