Download Returned Zip file from URL

Asked 23/2, 2012 at 18:42 Answered 21/6, 2021 at 18:17

157

If I have a URL that, when submitted in a web browser, pops up a dialog box to save a zip file, how would I go about catching and downloading this zip file in Python?

Genu answered 23/2, 2012 at 18:42 Comment(2)

I tried section Downloading a binary file and writing it to disk of this page which worked as a chram. – Exchangeable 3/10, 2018 at 13:23

For anyone else looking for a solution with only the standard library stuff, then check out this answer - https://mcmap.net/q/150855/-download-returned-zip-file-from-url and save yourself a couple of minutes of reading and scrolling through the rest of the answers which uses requests (which btw is an amazing library and I would use it if it was possible to do son. – Observation 20/8, 2023 at 8:32

Most people recommend using requests if it is available, and the requests documentation recommends this for downloading and saving raw data from a url:

import requests 

def download_url(url, save_path, chunk_size=128):
    r = requests.get(url, stream=True)
    with open(save_path, 'wb') as fd:
        for chunk in r.iter_content(chunk_size=chunk_size):
            fd.write(chunk)

Since the answer asks about downloading and saving the zip file, I haven't gone into details regarding reading the zip file. See one of the many answers below for possibilities.

If for some reason you don't have access to requests, you can use urllib.request instead. It may not be quite as robust as the above.

import urllib.request

def download_url(url, save_path):
    with urllib.request.urlopen(url) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())

Finally, if you are using Python 2 still, you can use urllib2.urlopen.

from contextlib import closing

def download_url(url, save_path):
    with closing(urllib2.urlopen(url)) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())

Macruran answered 23/2, 2012 at 18:45 Comment(0)

312

As far as I can tell, the proper way to do this in Python 2 is:

import requests, zipfile, StringIO
r = requests.get(zip_file_url, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()

of course you'd want to check that the GET was successful with r.ok.

For python 3+, sub the StringIO module with the io module and use BytesIO instead of StringIO: Here are release notes that mention this change.

import requests, zipfile, io
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/path/to/destination_directory")

Haze answered 10/1, 2013 at 14:50 Comment(13)

Thanks for this answer. I used it to solve my issue getting a zip file with requests. – Garniture 2/5, 2016 at 20:22

yoavram, in your code- where i enter the url of the webpage? – Corell 1/6, 2016 at 5:33

If you'd like to save the downloaded file in a different location, replace z.extractall() with z.extractall("/path/to/destination_directory") – Hammerless 14/10, 2016 at 8:14

@Corell I hope you figured it out by now, but the url of the zip you want to download is zip_file_url. – Haze 8/3, 2017 at 15:46

@Haze I was desperately looking for this answer. Can you tell me how to save the content as ".zip" file. If I do extractall() it extracts the content. I don't want that. – Obtund 1/1, 2018 at 6:44

If you just want to save the file from the url you can do: urllib.request.urlretrieve(url, filename). – Haze 2/1, 2018 at 7:24

To help others connect the dots it took me 60minutes too long to, you can then use pd.read_table(z.open('filename')) with the above. Useful if you have a zip url link that contains multiple files and you're only interested in loading one. – Lindeman 20/4, 2018 at 6:2

how to print the status of extracting? – Yorgen 21/3, 2019 at 6:29

@Haze How can I test these 3 lines if I put it in a function using Mock? – Pawnshop 27/8, 2019 at 0:43

not the right pattern according to 2.python-requests.org/en/master/user/quickstart/… – Wivern 9/9, 2019 at 23:0

what if the .zip file is over 10GB, won't the get() mess up with the memory? – Leanneleanor 25/2, 2020 at 11:7

When I do z = zipfile.ZipFile(io.BytesIO(r.content)), I get zipfile.BadZipFile: File is not a zip file. – Apply 12/10, 2023 at 14:16

I also get zipfile.BadZipFile: File is not a zip file with status code 400 – Cowskin 5/1 at 13:9

Most people recommend using requests if it is available, and the requests documentation recommends this for downloading and saving raw data from a url:

import requests 

def download_url(url, save_path, chunk_size=128):
    r = requests.get(url, stream=True)
    with open(save_path, 'wb') as fd:
        for chunk in r.iter_content(chunk_size=chunk_size):
            fd.write(chunk)

Since the answer asks about downloading and saving the zip file, I haven't gone into details regarding reading the zip file. See one of the many answers below for possibilities.

If for some reason you don't have access to requests, you can use urllib.request instead. It may not be quite as robust as the above.

import urllib.request

def download_url(url, save_path):
    with urllib.request.urlopen(url) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())

Finally, if you are using Python 2 still, you can use urllib2.urlopen.

from contextlib import closing

def download_url(url, save_path):
    with closing(urllib2.urlopen(url)) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())

Macruran answered 23/2, 2012 at 18:45 Comment(0)

With the help of this blog post, I've got it working with just requests. The point of the weird stream thing is so we don't need to call content on large requests, which would require it to all be processed at once, clogging the memory. The stream avoids this by iterating through the data one chunk at a time.

url = 'https://www2.census.gov/geo/tiger/GENZ2017/shp/cb_2017_02_tract_500k.zip'

response = requests.get(url, stream=True)
with open('alaska.zip', "wb") as f:
    for chunk in response.iter_content(chunk_size=512):
        if chunk:  # filter out keep-alive new chunks
            f.write(chunk)

Whitsuntide answered 11/7, 2018 at 19:27 Comment(3)

Answers should not rely on links for the bulk of their content. Links can go dead, or the content on the other side can be changed to no longer answer the question. Please edit your answer to include a summary or explanation of the information you link points to. – Nystagmus 11/7, 2018 at 19:57

What is chunk_size here? And can this parameter affect the speed of downloading? – Hasp 10/2, 2021 at 18:12

@ayushthakur Here are some links that may help: requests.Response.iter_content and wikipedia:Chunk Transfer Encoding. Someone else could probably give a better answer, but I wouldn't expect chunk_size to make of a difference for download speed if it's set large enough (reducing #pings/content ratio). 512 bytes seems super small in retrospect. – Whitsuntide 10/2, 2021 at 19:46

Here's what I got to work in Python 3:

import zipfile, urllib.request, shutil

url = 'http://www....myzipfile.zip'
file_name = 'myzip.zip'

with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)
    with zipfile.ZipFile(file_name) as zf:
        zf.extractall()

Suspire answered 29/7, 2015 at 17:10 Comment(3)

Hello. How can avoid this error: urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.? – Slavocracy 24/7, 2019 at 7:36

@VictorHerasmePerez, an HTTP 302 response status code means that the page has been moved. I think the issue your facing is addressed here: #32570434 – Suspire 24/7, 2019 at 11:34

@Suspire What if the zipped folder contains several files, then all those files will get extracted and stored in the system.I want to extract and get just one file from the zipped folder. Any way to achieve this? – Seashore 28/4, 2021 at 7:41

Super lightweight solution to save a .zip file to a location on disk (using Python 3.9):

import requests

url = r'https://linktofile'
output = r'C:\pathtofolder\downloaded_file.zip'

r = requests.get(url)
with open(output, 'wb') as f:
    f.write(r.content)

Testaceous answered 21/6, 2021 at 18:17 Comment(4)

#68524710 – Farriery 27/7, 2021 at 4:26

@AtomStore yes? Is there an issue with my answer? – Testaceous 27/7, 2021 at 10:58

how to bypass the alert, it downloads the html file rather than zip – Farriery 27/7, 2021 at 11:13

My answer works for the link I tested with. Try using my code, but replacing the url with: api.os.uk/downloads/v1/products/CodePointOpen/… (open data from Ordnance Survey) – Testaceous 27/7, 2021 at 13:16

I came here searching how to save a .bzip2 file. Let me paste the code for others who might come looking for this.

url = "http://api.mywebsite.com"
filename = "swateek.tar.gz"

response = requests.get(url, headers=headers, auth=('myusername', 'mypassword'), timeout=50)
if response.status_code == 200:
with open(filename, 'wb') as f:
   f.write(response.content)

I just wanted to save the file as is.

Galligan answered 18/7, 2019 at 13:56 Comment(0)

Either use urllib2.urlopen, or you could try using the excellent Requests module and avoid urllib2 headaches:

import requests
results = requests.get('url')
#pass results.content onto secondary processing...

Benenson answered 23/2, 2012 at 18:59 Comment(2)

But how do you parse results.content int a zip? – Northerner 9/3, 2012 at 12:2

Use the zipfile module: zip = zipfile.ZipFile(results.content). Then just parse through the files using ZipFile.namelist(), ZipFile.open(), or ZipFile.extractall() – Benenson 10/3, 2012 at 16:30

Thanks to @yoavram for the above solution, my url path linked to a zipped folder, and encounter an error of BADZipfile (file is not a zip file), and it was strange if I tried several times it retrieve the url and unzipped it all of sudden so I amend the solution a little bit. using the is_zipfile method as per here

r = requests.get(url, stream =True)
check = zipfile.is_zipfile(io.BytesIO(r.content))
while not check:
    r = requests.get(url, stream =True)
    check = zipfile.is_zipfile(io.BytesIO(r.content))
else:
    z = zipfile.ZipFile(io.BytesIO(r.content))
    z.extractall()

Lunalunacy answered 13/2, 2019 at 22:52 Comment(0)

Use `requests, zipfile and io` python packages.

Specially BytesIO function is used to keep the unzipped file in memory rather than saving it into the drive.

import requests
from zipfile import ZipFile
from io import BytesIO

r = requests.get(zip_file_url)
z = ZipFile(BytesIO(r.content))    
file = z.extract(a_file_to_extract, path_to_save)
with open(file) as f:
    print(f.read())

Beria answered 24/3, 2021 at 12:3 Comment(1)

Thank you! Can't believe I had to scroll all the way to the last answer to find one that used requests and didn't write to a file. – Barograph 24/12, 2022 at 1:21

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Use requests, zipfile and io python packages.

Recommended topics

Hot tags

Use `requests, zipfile and io` python packages.