Cancel slow download in python
Asked Answered
L

3

7

I am downloading files over http and displaying the progress using urllib and the following code - which works fine:

import sys
from urllib import urlretrieve

urlretrieve('http://example.com/file.zip', '/tmp/localfile', reporthook=dlProgress)

def dlProgress(count, blockSize, totalSize):
  percent = int(count*blockSize*100/totalSize)
  sys.stdout.write("\r" + "progress" + "...%d%%" % percent)
  sys.stdout.flush()

Now I would also like to restart the download if it is going too slow (say less than 1MB in 15 seconds). How can I achieve this?

Lightface answered 23/8, 2012 at 13:5 Comment(2)
You could raise an Exception in your reporthook.Tani
Yeah, raising an exception seems to be the popular way to stop downloading, from a quick look at Google. It's not mentioned in the documentation though, which makes me worry that it could have unexpected behavior. For example, maybe the data is fetched by a dedicated thread, and throwing an exception will make it an orphan and not actually stop the download.Impeccable
F
4

This should work. It calculates the actual download rate and aborts if it is too low.

import sys
from urllib import urlretrieve
import time

url = "http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz" # 14.135.620 Byte
startTime = time.time()

class TooSlowException(Exception):
    pass

def convertBToMb(bytes):
    """converts Bytes to Megabytes"""
    bytes = float(bytes)
    megabytes = bytes / 1048576
    return megabytes


def dlProgress(count, blockSize, totalSize):
    global startTime

    alreadyLoaded = count*blockSize
    timePassed = time.time() - startTime
    transferRate = convertBToMb(alreadyLoaded) / timePassed # mbytes per second
    transferRate *= 60 # mbytes per minute

    percent = int(alreadyLoaded*100/totalSize)
    sys.stdout.write("\r" + "progress" + "...%d%%" % percent)
    sys.stdout.flush()

    if transferRate < 4 and timePassed > 2: # download will be slow at the beginning, hence wait 2 seconds
        print "\ndownload too slow! retrying..."
        time.sleep(1) # let's not hammer the server
        raise TooSlowException

def main():
    try:
        urlretrieve(url, '/tmp/localfile', reporthook=dlProgress)

    except TooSlowException:
        global startTime
        startTime = time.time()
        main()

if __name__ == "__main__":
    main()
Feathercut answered 23/8, 2012 at 16:56 Comment(1)
Note that this will only work in the case of a slowing connection. The more usual dropped connection will not work unless you add a timeout to the socket. Otherwise -- OK! +1Lysias
L
3

Something like this:

class Timeout(Exception): 
    pass 

def try_one(func,t=3):
    def timeout_handler(signum, frame):
        raise Timeout()

    old_handler = signal.signal(signal.SIGALRM, timeout_handler) 
    signal.alarm(t) # triger alarm in 3 seconds

    try: 
        t1=time.clock()
        func()
        t2=time.clock()

    except Timeout:
        print('{} timed out after {} seconds'.format(func.__name__,t))
        return None
    finally:
        signal.signal(signal.SIGALRM, old_handler) 

    signal.alarm(0)
    return t2-t1

The call 'try_one' with the func you want to time out and the time to timeout:

try_one(downloader,15)

OR, you can do this:

import socket
socket.setdefaulttimeout(15)
Lysias answered 23/8, 2012 at 13:27 Comment(3)
This is a good solution if you're downloading small files of known size. If you don't know the size ahead of time, you won't know how many seconds to pass to try_one. And if you're downloading a 100MB file, try_one(downloader, 1500) won't give up until 1500 seconds have elapsed. Preferably, it would quit as soon as it was confident that the download won't finish in time.Impeccable
Yes, agreed. Thanks for the solution but I would like to cancel based on minimum throughput threshold not on whether the download has completed within a certain timeout.Lightface
@HolyMackerel: Just modify your report hook to have a Timeout at say 10 second intervals and check the rate. The problem is a hung download where 0 bytes are xfered and your report hook is never called.Lysias
S
0

HolyMackerel! Use the tools!

import urllib2, sys, socket, time, os

def url_tester(url = "http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz"):
    file_name = url.split('/')[-1]
    u = urllib2.urlopen(url,None,1)     # Note the timeout to urllib2...
    file_size = int(u.info().getheaders("Content-Length")[0])
    print ("\nDownloading: {} Bytes: {:,}".format(file_name, file_size))

    with open(file_name, 'wb') as f:    
        file_size_dl = 0
        block_sz = 1024*4
        time_outs=0
        while True:    
            try:
                buffer = u.read(block_sz)
            except socket.timeout:
                if time_outs > 3:   # file has not had activity in max seconds...
                    print "\n\n\nsorry -- try back later"
                    os.unlink(file_name)
                    raise
                else:              # start counting time outs...
                    print "\nHmmm... little issue... I'll wait a couple of seconds"
                    time.sleep(3)
                    time_outs+=1
                    continue

            if not buffer:   # end of the download             
                sys.stdout.write('\rDone!'+' '*len(status)+'\n\n')
                sys.stdout.flush()
                break

            file_size_dl += len(buffer)
            f.write(buffer)
            status = '{:20,} Bytes [{:.2%}] received'.format(file_size_dl, 
                                           file_size_dl * 1.0 / file_size)
            sys.stdout.write('\r'+status)
            sys.stdout.flush()

    return file_name 

This prints a status as expected. If I unplug my ethernet cable, I get:

 Downloading: Python-2.7.3.tgz Bytes: 14,135,620
             827,392 Bytes [5.85%] received


sorry -- try back later

If I unplug the cable, then plug it back in in less than 12 seconds, I get:

Downloading: Python-2.7.3.tgz Bytes: 14,135,620
             716,800 Bytes [5.07%] received
Hmmm... little issue... I'll wait a couple of seconds

Hmmm... little issue... I'll wait a couple of seconds
Done! 

The file is successfully downloaded.

You can see that urllib2 supports both timeouts and reconnects. If you disconnect and stay disconnected for 3 * 4 seconds == 12 seconds, it will timeout for good and raise a fatal exception. This could be dealt with as well.

Saundrasaunter answered 23/8, 2012 at 15:55 Comment(1)
Thanks, it's a nice solution but it catches stalled downloads rather than slow downloads.Lightface

© 2022 - 2024 — McMap. All rights reserved.