Timeout a file download with Python urllib?

Asked 24/9, 2015 at 14:16 Answered 27/1 at 8:46

Python beginner here. I want to be able to timeout my download of a video file if the process takes longer than 500 seconds.

import urllib
try:
   urllib.urlretrieve ("http://www.videoURL.mp4", "filename.mp4")
except Exception as e:
   print("error")

How do I amend my code to make that happen?

Steric answered 24/9, 2015 at 14:16 Comment(1)

Possible duplicate of #601348 – Kasten 24/9, 2015 at 14:38

Better way is to use requests so you can stream the results and easily check for timeouts:

import requests

# Make the actual request, set the timeout for no data to 10 seconds and enable streaming responses so we don't have to keep the large files in memory
request = requests.get('http://www.videoURL.mp4', timeout=10, stream=True)

# Open the output file and make sure we write in binary mode
with open('filename.mp4', 'wb') as fh:
    # Walk through the request response in chunks of 1024 * 1024 bytes, so 1MiB
    for chunk in request.iter_content(1024 * 1024):
        # Write the chunk to the file
        fh.write(chunk)
        # Optionally we can check here if the download is taking too long

Medullated answered 24/9, 2015 at 14:43 Comment(0)

Although urlretrieve does not have this feature, you can still set the default timeout (in seconds) for all new socket objects.

import socket
import urllib    

socket.setdefaulttimeout(15)

try:
   urllib.urlretrieve ("http://www.videoURL.mp4", "filename.mp4")
except Exception as e:
   print("error")

Tiloine answered 31/12, 2019 at 22:51 Comment(0)

urlretrieve does not have that option. But you can easily perform your example with the help of urlopen and writing the result in a file, like so:

request = urllib.urlopen("http://www.videoURL.mp4", timeout=500)
with open("filename.mp4", 'wb') as f:
    try:
        f.write(request.read())
    except:
        print("error")

That's if you are using Python 3. If you are using Python 2, you should rather use urllib2.

Hally answered 24/9, 2015 at 14:37 Comment(3)

urlopen can be easy, but for a large file. request.read() can be slow and take forever, you should consider adding a timeout around that function, probably using singal package. – Turgot 20/2, 2016 at 16:48

Not only can it be slow, it could fail completely. For example, suppose the file is 10GB in size, and won't fit in memory. – Bearer 12/6, 2019 at 21:2

Also note that the urllib.request.urlopen() function in Python 3 is equivalent to urllib2.urlopen() and that urllib.urlopen() has been removed.

The correct call in 3.6 is urllib.request.urlopen(). I don't know if maybe there is a Python version where urllib.urlopen() actually works, so I won't edit the answer. – Farming 14/11, 2019 at 21:4

I made two wrapper functions for retrieving from the web with a specific number of retries and the amount of timeout

class DownloadUnsuccessful(socket.timeout):
    pass

def web2soup(url, tries=9, timeout=30, sleepBetween=1):
    failures = 0
    while True:
        if failures == tries:
            raise DownloadUnsuccessful()
        try:
            with urllib.request.urlopen(url, timeout=timeout) as con:
                content = con.read().decode('utf-8')
                break
        except urllib.error.HTTPError:
            raise DownloadUnsuccessful()
        except urllib.error.URLError:
            time.sleep(sleepBetween)
        except TimeoutError:
            pass
        except socket.timeout:
            pass
        
        failures += 1
    
    soup = BeautifulSoup(content, 'html.parser')
    return soup

def web2file(url, filePath, tries=9, timeout=30, sleepBetween=1, tempExt='.temporary_filename'):
    tempPath = filePath + tempExt
    
    failures = 0
    while True:
        if failures == tries:
            try:
                os.remove(tempPath)
            except:
                pass
            raise DownloadUnsuccessful()
        try:
            socket.setdefaulttimeout(timeout)
            urllib.request.urlretrieve(url, tempPath)
            break
        except urllib.error.HTTPError:
            raise DownloadUnsuccessful()
        except urllib.error.URLError:
            time.sleep(sleepBetween)
        except TimeoutError:
            pass
        except socket.timeout:
            pass

    
    fileExt = os.path.splitext(url)[1]
    filePath = filePath+fileExt
    os.rename(tempPath, filePath)
    
    return filePath

This way i can just call them and know what exception to expect if something goes wrong.

Wispy answered 27/1 at 8:46 Comment(0)

Recommended topics

Hot tags