How to measure download speed and progress using requests?
Asked Answered
H

3

18

I am using requests to download files, but for large files I need to check the size of the file on disk every time because I can't display the progress in percentage and I would also like to know the download speed. How can I go about doing it ? Here's my code :

import requests
import sys
import time
import os

def downloadFile(url, directory) :
  localFilename = url.split('/')[-1]
  r = requests.get(url, stream=True)

  start = time.clock()
  f = open(directory + '/' + localFilename, 'wb')
  for chunk in r.iter_content(chunk_size = 512 * 1024) :
        if chunk :
              f.write(chunk)
              f.flush()
              os.fsync(f.fileno())
  f.close()
  return (time.clock() - start)

def main() :
  if len(sys.argv) > 1 :
        url = sys.argv[1]
  else :
        url = raw_input("Enter the URL : ")
  directory = raw_input("Where would you want to save the file ?")

  time_elapsed = downloadFile(url, directory)
  print "Download complete..."
  print "Time Elapsed: " + time_elapsed


if __name__ == "__main__" :
  main()

I think one way to do it would be to read the file every time in the for loop and calculate the percentage of progress based on the header Content-Length. But that would be again an issue for large files(around 500MB). Is there any other way to do it?

Herrick answered 27/12, 2013 at 12:49 Comment(0)
B
27

see here: Python progress bar and downloads

i think the code would be something like this, it should show the average speed since start as bytes per second:

import requests
import sys
import time

def downloadFile(url, directory) :
  localFilename = url.split('/')[-1]
  with open(directory + '/' + localFilename, 'wb') as f:
    start = time.clock()
    r = requests.get(url, stream=True)
    total_length = r.headers.get('content-length')
    dl = 0
    if total_length is None: # no content length header
      f.write(r.content)
    else:
      for chunk in r.iter_content(1024):
        dl += len(chunk)
        f.write(chunk)
        done = int(50 * dl / total_length)
        sys.stdout.write("\r[%s%s] %s bps" % ('=' * done, ' ' * (50-done), dl//(time.clock() - start)))
        print ''
  return (time.clock() - start)

def main() :
  if len(sys.argv) > 1 :
        url = sys.argv[1]
  else :
        url = raw_input("Enter the URL : ")
  directory = raw_input("Where would you want to save the file ?")

  time_elapsed = downloadFile(url, directory)
  print "Download complete..."
  print "Time Elapsed: " + time_elapsed


if __name__ == "__main__" :
  main()
Backpedal answered 19/2, 2014 at 0:2 Comment(4)
This code looks good but IMO it won't show dynamic downloading , since when we request for requests.get(...) it will download entire file then it will come out of get function. This is dynamic features .Gum
@sonukumar, notice the stream parameter in the get call request.get(url , stream=True). Check out the documentation.Hydrosphere
@freeforalltousez What's the meaning of multiplying 50 when calculating the downloaded percentage ?Honea
@Honea it's the length of the progress bar. See the linked answer.Backpedal
T
8

An improved version of the accepted answer for python3 using io.Bytes (write to memory), result in Mbps, support for ipv4/ipv6, size and port arguments.

import sys, time, io, requests

def speed_test(size=5, ipv="ipv4", port=80):

    if size == 1024:
        size = "1GB"
    else:
        size = f"{size}MB"
    url = f"http://{ipv}.download.thinkbroadband.com:{port}/{size}.zip"
    with io.BytesIO() as f:
        start = time.perf_counter()
        r = requests.get(url, stream=True)
        total_length = r.headers.get('content-length')
        dl = 0
        if total_length is None: # no content length header
            f.write(r.content)
        else:
            for chunk in r.iter_content(1024):
                dl += len(chunk)
                f.write(chunk)
                done = int(30 * dl / int(total_length))
                sys.stdout.write("\r[%s%s] %s Mbps" % ('=' * done, ' ' * (30-done), dl//(time.perf_counter() -
start) / 100000))
    print( f"\n{size} = {(time.perf_counter() - start):.2f} seconds")

Usage Examples:

speed_test()
speed_test(10)
speed_test(50, "ipv6")
speed_test(1024, port=8080)

Output Sample:

[==============================] 61.34037 Mbps
100MB = 17.10 seconds

Available Options:

size: 5, 10, 20, 50, 100, 200, 512, 1024

ipv: ipv4, ipv6

port: 80, 81, 8080


Updated on 20221011:

  • time.perf_counter() replaced time.clock(), which has been deprecated on python 3.3 (kudos to shiro)
Tirewoman answered 23/2, 2020 at 6:23 Comment(2)
The function time.clock() has been removed, after having been deprecated since Python 3.3: use time.perf_counter() in above solution code.Verein
Answer updated, tks.Tirewoman
T
0

I had a problem with a specific slow server to download a big file

  1. no Content-Length header.
  2. big file (42GB),
  3. no compression,
  4. slow server (<1MB/s),

Beeing this big, I had also problem with memory usage during the request. Requests doesn't write output on file, like urlibs does, looks like it keep it in memory.

No content length header makes the accepted answer.. not monitoring.

So I wrote this -basic- method to monitor speed during the csv download following just the "requests" documentation.

It needs a fname (complete output path), a link (http or https) and you can specify custom headers.

BLOCK=5*1024*1024
try:
    with open(fname, 'wb') as f:
        r = requests.get(link, headers=headers, stream=True)

        ## This is, because official dozumentation suggest it, 
        ## saying it's more reliable thatn cycling directly on iterlines, to don't lose data
        lines = r.iter_lines()

        ## Init the base vars, for monitor and block management
        ## Obj is a byte object, because iterlines returno objects
        tsize = 0; obj = bytearray(); t0=time.time(); i=0;
        for line in lines:

            ## calculate the line size, in bytes, and add to the byte object
            tsize+=len(line)
            obj.extend(line)

            ## When condition reached, 
            if tsize > BLOCK:   
                ## Increment the block number
                i+=1;
                
                ## Calculate the speed.. this is in MB/s, 
                ## but you can easily change to KB/s, or Blocks/s
                t1=time.time()
                t=t1-t0;
                speed=round(5/t, 2);

                ## Write the block to the file.
                f.write(obj)

                ## Write stats
                print('got', i*5, 'MB ', 'block' ,i, ' @', speed,'MB/s')

                ## Reinit all the base vars, for a new block
                obj=bytearray(); tsize=0; t0=time.time()

        ## Write the last block part to the file. 
        f.write(obj)

except Exception as e:
        print("Error: ", e, 0)
Toothache answered 27/1, 2023 at 17:19 Comment(1)
I don't get your argumentations for iter_lines: 1) " because official dozumentation suggest it, ", where? 2) " don't lose data" in which way? The only thing a read is that is not reentrant safeVitrain

© 2022 - 2024 — McMap. All rights reserved.