If I have a URL that, when submitted in a web browser, pops up a dialog box to save a zip file, how would I go about catching and downloading this zip file in Python?
Most people recommend using requests
if it is available, and the requests
documentation recommends this for downloading and saving raw data from a url:
import requests
def download_url(url, save_path, chunk_size=128):
r = requests.get(url, stream=True)
with open(save_path, 'wb') as fd:
for chunk in r.iter_content(chunk_size=chunk_size):
fd.write(chunk)
Since the answer asks about downloading and saving the zip file, I haven't gone into details regarding reading the zip file. See one of the many answers below for possibilities.
If for some reason you don't have access to requests
, you can use urllib.request
instead. It may not be quite as robust as the above.
import urllib.request
def download_url(url, save_path):
with urllib.request.urlopen(url) as dl_file:
with open(save_path, 'wb') as out_file:
out_file.write(dl_file.read())
Finally, if you are using Python 2 still, you can use urllib2.urlopen
.
from contextlib import closing
def download_url(url, save_path):
with closing(urllib2.urlopen(url)) as dl_file:
with open(save_path, 'wb') as out_file:
out_file.write(dl_file.read())
As far as I can tell, the proper way to do this in Python 2 is:
import requests, zipfile, StringIO
r = requests.get(zip_file_url, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()
of course you'd want to check that the GET was successful with r.ok
.
For python 3+, sub the StringIO module with the io module and use BytesIO instead of StringIO: Here are release notes that mention this change.
import requests, zipfile, io
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/path/to/destination_directory")
z.extractall()
with z.extractall("/path/to/destination_directory")
–
Hammerless zip_file_url
. –
Haze extractall()
it extracts the content. I don't want that. –
Obtund urllib.request.urlretrieve(url, filename)
. –
Haze pd.read_table(z.open('filename'))
with the above. Useful if you have a zip url link that contains multiple files and you're only interested in loading one. –
Lindeman z = zipfile.ZipFile(io.BytesIO(r.content))
, I get zipfile.BadZipFile: File is not a zip file
. –
Apply Most people recommend using requests
if it is available, and the requests
documentation recommends this for downloading and saving raw data from a url:
import requests
def download_url(url, save_path, chunk_size=128):
r = requests.get(url, stream=True)
with open(save_path, 'wb') as fd:
for chunk in r.iter_content(chunk_size=chunk_size):
fd.write(chunk)
Since the answer asks about downloading and saving the zip file, I haven't gone into details regarding reading the zip file. See one of the many answers below for possibilities.
If for some reason you don't have access to requests
, you can use urllib.request
instead. It may not be quite as robust as the above.
import urllib.request
def download_url(url, save_path):
with urllib.request.urlopen(url) as dl_file:
with open(save_path, 'wb') as out_file:
out_file.write(dl_file.read())
Finally, if you are using Python 2 still, you can use urllib2.urlopen
.
from contextlib import closing
def download_url(url, save_path):
with closing(urllib2.urlopen(url)) as dl_file:
with open(save_path, 'wb') as out_file:
out_file.write(dl_file.read())
With the help of this blog post, I've got it working with just requests
.
The point of the weird stream
thing is so we don't need to call content
on large requests, which would require it to all be processed at once,
clogging the memory. The stream
avoids this by iterating through the data
one chunk at a time.
url = 'https://www2.census.gov/geo/tiger/GENZ2017/shp/cb_2017_02_tract_500k.zip'
response = requests.get(url, stream=True)
with open('alaska.zip', "wb") as f:
for chunk in response.iter_content(chunk_size=512):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
chunk_size
here? And can this parameter affect the speed of downloading? –
Hasp requests.Response.iter_content
and wikipedia:Chunk Transfer Encoding. Someone else could probably give a better answer, but I wouldn't expect chunk_size
to make of a difference for download speed if it's set large enough (reducing #pings/content ratio). 512 bytes seems super small in retrospect. –
Whitsuntide Here's what I got to work in Python 3:
import zipfile, urllib.request, shutil
url = 'http://www....myzipfile.zip'
file_name = 'myzip.zip'
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
with zipfile.ZipFile(file_name) as zf:
zf.extractall()
urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
? –
Slavocracy Super lightweight solution to save a .zip file to a location on disk (using Python 3.9):
import requests
url = r'https://linktofile'
output = r'C:\pathtofolder\downloaded_file.zip'
r = requests.get(url)
with open(output, 'wb') as f:
f.write(r.content)
I came here searching how to save a .bzip2 file. Let me paste the code for others who might come looking for this.
url = "http://api.mywebsite.com"
filename = "swateek.tar.gz"
response = requests.get(url, headers=headers, auth=('myusername', 'mypassword'), timeout=50)
if response.status_code == 200:
with open(filename, 'wb') as f:
f.write(response.content)
I just wanted to save the file as is.
Either use urllib2.urlopen, or you could try using the excellent Requests
module and avoid urllib2 headaches:
import requests
results = requests.get('url')
#pass results.content onto secondary processing...
zipfile
module: zip = zipfile.ZipFile(results.content)
. Then just parse through the files using ZipFile.namelist()
, ZipFile.open()
, or ZipFile.extractall()
–
Benenson Thanks to @yoavram for the above solution, my url path linked to a zipped folder, and encounter an error of BADZipfile (file is not a zip file), and it was strange if I tried several times it retrieve the url and unzipped it all of sudden so I amend the solution a little bit. using the is_zipfile method as per here
r = requests.get(url, stream =True)
check = zipfile.is_zipfile(io.BytesIO(r.content))
while not check:
r = requests.get(url, stream =True)
check = zipfile.is_zipfile(io.BytesIO(r.content))
else:
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()
Use requests, zipfile and io
python packages.
Specially BytesIO function is used to keep the unzipped file in memory rather than saving it into the drive.
import requests
from zipfile import ZipFile
from io import BytesIO
r = requests.get(zip_file_url)
z = ZipFile(BytesIO(r.content))
file = z.extract(a_file_to_extract, path_to_save)
with open(file) as f:
print(f.read())
© 2022 - 2024 — McMap. All rights reserved.
requests
(which btw is an amazing library and I would use it if it was possible to do son. – Observation