Get file size from "Content-Length" value from a file in python 3.2
Asked Answered
A

6

10

I want to get the Content-Length value from the meta variable. I need to get the size of the file that I want to download. But the last line returns an error, HTTPMessage object has no attribute getheaders.

import urllib.request
import http.client

#----HTTP HANDLING PART----
 url = "http://client.akamai.com/install/test-objects/10MB.bin"

file_name = url.split('/')[-1]
d = urllib.request.urlopen(url)
f = open(file_name, 'wb')

#----GET FILE SIZE----
meta = d.info()

print ("Download Details", meta)
file_size = int(meta.getheaders("Content-Length")[0])
Acrobatics answered 21/10, 2012 at 8:42 Comment(0)
F
13

It looks like you are using Python 3, and have read some code / documentation for Python 2.x. It is poorly documented, but there is no getheaders method in Python 3, but only a get_all method.

See this bug report.

Fustic answered 21/10, 2012 at 8:51 Comment(2)
For the benefit of people from Google, it seems you can now do file_size = int(d.getheader('Content-Length')) in Python 3 (tested in 3.4.1). d.getheaders() also seems to have been added.Calumet
@freshtop: Both d.getheader() and d.getheaders() work even on Python 3.2. Note: OP uses d.info() instead of d here. d.info().getheader() and d.info().getheaders() is Python 2 code. To support both Python 2 and 3, d.headers['Content-Length'] could be used.Wnw
S
7

for Content-Length:

file_size = int(d.getheader('Content-Length'))
Sacellum answered 21/10, 2012 at 14:56 Comment(2)
I think they are looking for a python3 solution, (at least I am and this is the top google hit)Felishafelita
@ThorSummoner: d.getheader() works on Python 3 only. The question has python-3.x tag and therefore Python 3 only solution is appropriate.Wnw
B
6

Change final line to:

file_size = int(meta.get_all("Content-Length")[0])
Breakage answered 22/12, 2014 at 5:42 Comment(0)
M
4

You should consider using Requests:

import requests

url = "http://client.akamai.com/install/test-objects/10MB.bin"
resp = requests.get(url)

print resp.headers['content-length']
# '10485760'

For Python 3, use:

print(resp.headers['content-length'])

instead.

Mousey answered 21/10, 2012 at 8:51 Comment(7)
+1, If you only expect one header, go with the item operator. However, I fear there is no headers attribute in Python3, so it should probably be resp.get("Content-Length") or maybe resp["Content-Length"] (didn't try this)Fustic
seems to be no requests libraries in python 3.2...think i should switch versions...which version you guys using ?Acrobatics
@Acrobatics Requests recently added 3.3 support. I am running 2.7.3.Mousey
@Fustic That wasn't an issue, as resp is a Requests response dict. There's one thing I need to change though.. it should be print(resp.headers) instead for Python3.Mousey
@Acrobatics You are welcome! I forgot to change print statement to python3's format in the original answer.Mousey
@KayZhu, yes of course. Overlooked that you had removed the info() call :)Fustic
@Fustic ah ok, though I didn't really remove anything in my post edit. There was never a info() call, I suppose you meant you mislooked? :)Mousey
W
2

response.headers['Content-Length'] works on both Python 2 and 3:

#!/usr/bin/env python
from contextlib import closing

try:
    from urllib2 import urlopen
except ImportError: # Python 3
    from urllib.request import urlopen


with closing(urlopen('https://mcmap.net/q/821686/-get-file-size-from-quot-content-length-quot-value-from-a-file-in-python-3-2')) as response:
    print("File size: " + response.headers['Content-Length'])
Wnw answered 23/7, 2015 at 0:28 Comment(4)
This doesn't work if a header is repeated. You only get the first one when using the headers attribute. The only reliable way is to use info().get_all(). In Python2 info().get() would concatenate all duplicate headers but this fragile behavior has been removed for Py3. Unfortunately get_all() hasn't been backported to Py2 so we are stuck having to wrestle with this poorly documented library for more years to come.Benge
@KevinThibedeau: 1- duplicate Content-Length headers with different values are not supported in http 2- info() is implemented as return self.headers.Wnw
From RFC-6265: "Origin servers SHOULD NOT fold multiple Set-Cookie header fields into a single header field". It is not at all unusual to receive duplicate headers. Python's libraries need to support this behavior properly.Benge
@KevinThibedeau: Set-Cookie is a well-known exception -- you should not use it as an example for other http headers. rfc7230 specifies the behavior for the Content-Length header explicitly (read the link from my previous comment).Wnw
O
0
import urllib.request

link = "<url here>"

f = urllib.request.urlopen(link)
meta = f.info()
print (meta.get("Content-length"))
f.close()

Works with python 3.x

Overzealous answered 22/7, 2015 at 17:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.