requests response.iter_content() gets incomplete file ( 1024MB instead of 1.5GB )?

Asked 14/5, 2014 at 3:38 Answered 15/3, 2020 at 8:7

python web-scraping urllib python-requests

hi i have been using this code snippet to download files from a website, so far files smaller than 1GB are all good. but i noticed a 1.5GB file is incomplete

# s is requests session object
r = s.get(fileUrl, headers=headers, stream=True)

start_time = time.time()
with open(local_filename, 'wb') as f:
    count = 1
    block_size = 512
    try:
        total_size = int(r.headers.get('content-length'))
        print 'file total size :',total_size
    except TypeError:
        print 'using dummy length !!!'
        total_size = 10000000

    for chunk in r.iter_content(chunk_size=block_size):

        if chunk:  # filter out keep-alive new chunks

            duration = time.time() - start_time
            progress_size = int(count * block_size)
            if duration == 0:
                duration = 0.1
            speed = int(progress_size / (1024 * duration))
            percent = int(count * block_size * 100 / total_size)
            sys.stdout.write("\r...%d%%, %d MB, %d KB/s, %d seconds passed" %
                            (percent, progress_size / (1024 * 1024), speed, duration))

            f.write(chunk)
            f.flush()
            count += 1

using latest requests 2.2.1 python 2.6.6, centos 6.4 the file download always stops at 66.7% 1024MB, what am i missing ? the output:

file total size : 1581244542
...67%, 1024 MB, 5687 KB/s, 184 seconds passed

it seems the generator returned by iter_content() thinks all chunks are retrieved and there is no error. btw the exception part did not run, because the server did return the content-length in response header.

Oxidate answered 14/5, 2014 at 3:38 Comment(5)

Note "b" = bit, while "B" = byte (which is probably what you mean) – Treacy 14/5, 2014 at 3:40

@Jonathon ok ... orz, i updated the post – Oxidate 14/5, 2014 at 3:42

What is the s in s.get(...)? – Ingmar 14/5, 2014 at 4:16

@Lego s is requests session object .... The site i'm downloading from needs authentication and I omitted those code – Oxidate 14/5, 2014 at 4:55

@Shuman, Did you manage to solve the problem? Got the same here .... – Songstress 6/12, 2016 at 6:50

Please double check that you can download the file via wget and/or any regular browser. It could be restriction on the server. As I see your code can download big files (bigger then 1.5Gb)

Update: please try to inverse the logic - instead of

if chunk: # filter out keep-alive new chunks                                                                                                                                                                                                         
    f.write(chunk)                                                                                                                                                                                                                                   
    f.flush()

try

if not chunk:
   break

f.write(chunk)                                                                                                                                                                                                                                   
f.flush()

Mare answered 14/5, 2014 at 12:0 Comment(1)

just checked again in firefox 29, manually download works , but via code it doesn't work. always stops at 1024 MB. – Oxidate 14/5, 2014 at 13:59

If you are using Nginx as file system, you may check Nginx config file to see if you have set

proxy_max_temp_file_size 3000m;

or not.

By default this size is 1G. So you can only get 1024MB.

Malebranche answered 15/3, 2020 at 8:7 Comment(0)

I think you forgot to close req.

from the requests author said, "If you find yourself partially reading request bodies (or not reading them at all) while using stream=True, you should make the request within a with statement to ensure it’s always closed:"

http://2.python-requests.org//en/latest/user/advanced/#body-content-workflow.

Seaver answered 13/8, 2019 at 9:22 Comment(0)

Recommended topics

Hot tags