How to Download only the first x bytes of data Python

Asked 15/1, 2018 at 6:34 Answered 15/1, 2018 at 6:54

Solved python python-2.7 download urllib urlretrieve

Situation: The file to be downloaded is a large file (>100MB). It takes quite some time, especially with slow internet connection.

Problem: However, I just need the file header (the first 512 bytes), which will decide if the whole file needs to be downloaded or not.

Question: Is there a way to do download only the first 512 bytes of a file?

Additional information: Currently the download is done using urllib.urlretrieve in Python2.7

Danonorwegian answered 15/1, 2018 at 6:34 Comment(5)

I would take wget apart and modify it so it stops before the end. – Coppice 15/1, 2018 at 6:37

Are you able to use the HTTP HEAD method? That returns only the headers. – Rootless 15/1, 2018 at 6:39

@user2896976 Those are for the HTTP Headers I believe? I need the file headers, which is in the first 512 bytes in the file. – Danonorwegian 15/1, 2018 at 6:44

@Jean-FrançoisFabre Would love to do that too but with my skills I think I will get murdered by my teacher before I am done HAHAHA. But thanks though - didn't think of that – Danonorwegian 15/1, 2018 at 6:45

Range request? https://mcmap.net/q/690424/-only-download-a-part-of-the-document-using-python-requests – Berth 15/1, 2018 at 6:47

I think curl and head would work better than a Python solution here:

curl https://my.website.com/file.txt | head -c 512 > header.txt

EDIT: Also, if you absolutely must have it in a Python script, you can use subprocess to perform the curl piped to head command execution

EDIT 2: For a fully Python solution: The urlopen function (urllib2.urlopen in Python 2, and urllib.request.urlopen in Python 3) returns a file-like stream that you can use the read function on, which allows you to specify a number of bytes. For example, urllib2.urlopen(my_url).read(512) will return the first 512 bytes of my_url

Zaremski answered 15/1, 2018 at 6:40 Comment(4)

Ah yes. The edit was what I needed. But no Python modules can do this? – Danonorwegian 15/1, 2018 at 6:43

The urlopen function (urllib2.urlopen in Python 2, and urllib.request.urlopen in Python 3) returns a file-like stream that you can use the read function on, which allows you to specify a number of bytes. For example, urllib2.urlopen(my_url).read(512) will return the first 512 bytes of my_url. However, I'm not certain this will only download 512 bytes, or if it will try to download the entire file behind-the-scenes and just return the first 512 – Zaremski 15/1, 2018 at 6:47

the one in the comment works. do you want to replace it and let me accept as answer? – Danonorwegian 15/1, 2018 at 7:2

Might I add on that urllib also has the same module. If you choose to lessen the number of libraries you are importing. (I have imported urllib and was actually hesitant to import urllib2) – Danonorwegian 15/1, 2018 at 15:14

If the url you are trying to read responds with Content-Length header, then you can get the file size with urllib2 in Python 2.

def get_file_size(url):
    request = urllib2.Request(url)
    request.get_method = lambda : 'HEAD'
    response = urllib2.urlopen(request)
    length = response.headers.getheader("Content-Length")
    return int(length)

The function can be called to get the length and compared with some threshold value to decide whether to download or not.

if get_file_size("http://stackoverflow.com") < 1000000:
    # Download

(Note that the Python 3 implimentation differs slightly:)

from urllib import request

def get_file_size(url):
    r = request.Request(url)
    r.get_method = lambda : 'HEAD'
    response = request.urlopen(r)
    length = response.getheader("Content-Length")
    return int(length)

Runstadler answered 15/1, 2018 at 6:54 Comment(1)

Love the idea, but I need to compare its hash values that is the one present in the file header. The file size can be the same but its contents may be different. Therefore the hash value is more reliable as a check than file size. – Danonorwegian 15/1, 2018 at 6:56

Recommended topics

Hot tags