Check for `urllib.urlretrieve(url, file_name)` Completion Status

Asked 21/7, 2012 at 19:1 Answered 20/5, 2017 at 16:17

How do I check to see if urllib.urlretrieve(url, file_name) has completed before allowing my program to advance to the next statement?

Take for example the following code snippet:

import traceback
import sys
import Image
from urllib import urlretrieve

try:
        print "Downloading gif....."
        urlretrieve(imgUrl, "tides.gif")
        # Allow time for image to download/save:
        time.sleep(5)
        print "Gif Downloaded."
    except:
        print "Failed to Download new GIF"
        raw_input('Press Enter to exit...')
        sys.exit()

    try:
        print "Converting GIF to JPG...."
        Image.open("tides.gif").convert('RGB').save("tides.jpg")
        print "Image Converted"
    except Exception, e:
        print "Conversion FAIL:", sys.exc_info()[0]
        traceback.print_exc()
        pass

When the download of 'tides.gif' via urlretrieve(imgUrl, "tides.gif") takes longer than time.sleep(seconds) resulting in an empty or not-complete file, Image.open("tides.gif") raises an IOError (due to a tides.gif file of size 0 kB).

How can I check the status of urlretrieve(imgUrl, "tides.gif"), allowing my program to advance only after the statement has been successfully completed?

Brace answered 21/7, 2012 at 19:1 Comment(0)

Requests is nicer than urllib but you should be able to do this to synchronously download the file:

import urllib
f = urllib.urlopen(imgUrl)
with open("tides.gif", "wb") as imgFile:
    imgFile.write(f.read())
# you won't get to this print until you've downloaded
# all of the image at imgUrl or an exception is raised
print "Got it!"

The downside of this is it will need to buffer the whole file in memory so if you're downloading a lot of images at once you may end up using a ton of ram. It's unlikely, but still worth knowing.

Remus answered 21/7, 2012 at 19:21 Comment(0)

I would use python requests from http://docs.python-requests.org/en/latest/index.html instead of plain urllib2. requests is synchronous by default so it won't progress to the next line of code without getting your image first.

Maestricht answered 21/7, 2012 at 19:13 Comment(1)

There is also an async model within requests, but you need gevent and greenlet – Deficiency 21/7, 2012 at 19:17

I found a similar question here: Why is "raise IOError("cannot identify image file")"showing up only part of the time?

To be more specific, look at the answer to the question. The user points to a couple of other threads that explain exactly how to solve the problem in multiple ways. The first one, which you may be interested in, includes a progress bar display.

Clock answered 21/7, 2012 at 19:9 Comment(0)

The selected answer doesn't work with big files. Here is the correct solution:

import sys
import time
import urllib


def reporthook(count, block_size, total_size):
    if int(count * block_size * 100 / total_size) == 100:
        print 'Download completed!'

def save(url, filename):
    urllib.urlretrieve(url, filename, reporthook)

Somewhat answered 4/6, 2014 at 21:35 Comment(0)

you can try this below :

import time

# ----------------------------------------------------
# Wait until the end of the download
# ----------------------------------------------------

valid=0
while valid==0:
    try:
        with open("tides.gif"):valid=1
    except IOError:
        time.sleep(1)

print "Got it !"

# ----------------------------------------------------
# //////////////////////////////////////////////////
# ----------------------------------------------------

Killough answered 20/5, 2017 at 16:17 Comment(0)

Recommended topics

Hot tags