Timeout a file download with Python urllib?
Asked Answered
S

4

23

Python beginner here. I want to be able to timeout my download of a video file if the process takes longer than 500 seconds.

import urllib
try:
   urllib.urlretrieve ("http://www.videoURL.mp4", "filename.mp4")
except Exception as e:
   print("error")

How do I amend my code to make that happen?

Steric answered 24/9, 2015 at 14:16 Comment(1)
Possible duplicate of #601348Kasten
M
19

Better way is to use requests so you can stream the results and easily check for timeouts:

import requests

# Make the actual request, set the timeout for no data to 10 seconds and enable streaming responses so we don't have to keep the large files in memory
request = requests.get('http://www.videoURL.mp4', timeout=10, stream=True)

# Open the output file and make sure we write in binary mode
with open('filename.mp4', 'wb') as fh:
    # Walk through the request response in chunks of 1024 * 1024 bytes, so 1MiB
    for chunk in request.iter_content(1024 * 1024):
        # Write the chunk to the file
        fh.write(chunk)
        # Optionally we can check here if the download is taking too long
Medullated answered 24/9, 2015 at 14:43 Comment(0)
T
15

Although urlretrieve does not have this feature, you can still set the default timeout (in seconds) for all new socket objects.

import socket
import urllib    

socket.setdefaulttimeout(15)

try:
   urllib.urlretrieve ("http://www.videoURL.mp4", "filename.mp4")
except Exception as e:
   print("error")
Tiloine answered 31/12, 2019 at 22:51 Comment(0)
H
3

urlretrieve does not have that option. But you can easily perform your example with the help of urlopen and writing the result in a file, like so:

request = urllib.urlopen("http://www.videoURL.mp4", timeout=500)
with open("filename.mp4", 'wb') as f:
    try:
        f.write(request.read())
    except:
        print("error")

That's if you are using Python 3. If you are using Python 2, you should rather use urllib2.

Hally answered 24/9, 2015 at 14:37 Comment(3)
urlopen can be easy, but for a large file. request.read() can be slow and take forever, you should consider adding a timeout around that function, probably using singal package.Turgot
Not only can it be slow, it could fail completely. For example, suppose the file is 10GB in size, and won't fit in memory.Bearer
Also note that the urllib.request.urlopen() function in Python 3 is equivalent to urllib2.urlopen() and that urllib.urlopen() has been removed. The correct call in 3.6 is urllib.request.urlopen(). I don't know if maybe there is a Python version where urllib.urlopen() actually works, so I won't edit the answer.Farming
W
0

I made two wrapper functions for retrieving from the web with a specific number of retries and the amount of timeout

class DownloadUnsuccessful(socket.timeout):
    pass

def web2soup(url, tries=9, timeout=30, sleepBetween=1):
    failures = 0
    while True:
        if failures == tries:
            raise DownloadUnsuccessful()
        try:
            with urllib.request.urlopen(url, timeout=timeout) as con:
                content = con.read().decode('utf-8')
                break
        except urllib.error.HTTPError:
            raise DownloadUnsuccessful()
        except urllib.error.URLError:
            time.sleep(sleepBetween)
        except TimeoutError:
            pass
        except socket.timeout:
            pass
        
        failures += 1
    
    soup = BeautifulSoup(content, 'html.parser')
    return soup

def web2file(url, filePath, tries=9, timeout=30, sleepBetween=1, tempExt='.temporary_filename'):
    tempPath = filePath + tempExt
    
    failures = 0
    while True:
        if failures == tries:
            try:
                os.remove(tempPath)
            except:
                pass
            raise DownloadUnsuccessful()
        try:
            socket.setdefaulttimeout(timeout)
            urllib.request.urlretrieve(url, tempPath)
            break
        except urllib.error.HTTPError:
            raise DownloadUnsuccessful()
        except urllib.error.URLError:
            time.sleep(sleepBetween)
        except TimeoutError:
            pass
        except socket.timeout:
            pass

    
    fileExt = os.path.splitext(url)[1]
    filePath = filePath+fileExt
    os.rename(tempPath, filePath)
    
    return filePath

This way i can just call them and know what exception to expect if something goes wrong.

Wispy answered 27/1 at 8:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.