python httplib/urllib get filename

M

4

13

is there a possibillity to get the filename

e.g. xyz.com/blafoo/showall.html

if you work with urllib or httplib?

so that i can save the file under the filename on the server?

if you go to sites like

xyz.com/blafoo/

you cant see the filename.

Thank you

Margaret answered 2/8, 2012 at 18:5 Comment(1)

possible duplicate of urllib2 file name – Immersionism 2/8, 2012 at 18:11

R

31

To get filename from response http headers:

import cgi

response = urllib2.urlopen(URL)
_, params = cgi.parse_header(response.headers.get('Content-Disposition', ''))
filename = params['filename']

To get filename from the URL:

import posixpath
import urlparse 

path = urlparse.urlsplit(URL).path
filename = posixpath.basename(path)

Rights answered 2/8, 2012 at 18:9 Comment(4)

Great answer, one tiny fix. Using os.path.basename(path) is a cross platform way of doing this. – Butanone 26/8, 2013 at 6:25

@JorgeVargas: no. posixpath is the correct module here. Moreover it would be a mistake to use os.path here. If you can't figure out "why", ask, I'll elaborate. – Rights 26/8, 2013 at 12:3

I'll ask: why should one use posixpath? – Disassociate 17/9, 2014 at 0:35

@KarlM.Davis: urls use '/' in their path segment. os.path on Windows may use '\\' that is not appropriate for urls as pathname separator. posixpath uses '/'. – Rights 17/9, 2014 at 0:57

W

4

Use urllib.request.Request:

import urllib

req = urllib.request.Request(url, method='HEAD')
r = urllib.request.urlopen(req)
print(r.info().get_filename())

Example :

In[1]: urllib.request.urlopen(urllib.request.Request('https://httpbin.org/response-headers?content-disposition=%20attachment%3Bfilename%3D%22example.csv%22', method='HEAD')).info().get_filename()
Out[1]: 'example.csv'

Watchband answered 18/4, 2019 at 11:27 Comment(0)

T

1

Does not make much sense what you are asking. The only thing that you have is the URL. Either extract the last part from the URL or you may check the HTTP response for something like

content-disposition: attachment;filename="foo.bar"

This header can be set by the server to indicate that the filename is foo.bar. This is usually used for file downloads or something similar.

Threnody answered 2/8, 2012 at 18:9 Comment(0)

R

0

I searched for you question on google and I saw that it was answered in stackoverflow before I believe.

Try looking at this post:

Using urllib2 in Python. How do I get the name of the file I am downloading?

The filename is usually included by the server through the content-disposition header:
content-disposition: attachment; filename=foo.pdf
You have access to the headers through
result = urllib2.urlopen(...)
result.info() <- contains the headers


i>>> import urllib2
ur>>> result = urllib2.urlopen('http://zopyx.com')
>>> print result
<addinfourl at 4302289808 whose fp = <socket._fileobject object at 0x1006dd5d0>>
>>> result.info()
<httplib.HTTPMessage instance at 0x1006fbab8>
>>> result.info().headers
['Date: Mon, 04 Apr 2011 02:08:28 GMT\r\n', 'Server: Zope/(unreleased version, python 2.4.6, linux2) ZServer/1.1
Plone/3.3.4\r\n', 'Content-Length: 15321\r\n', 'Content-Type: text/html; charset=utf-8\r\n', 'Via: 1.1 www.zopyx.com\r\n', 'Cache-Control: max-age=3600\r\n', 'Expires: Mon, 04 Apr 2011 03:08:28 GMT\r\n', 'Connection: close\r\n']

See

http://docs.python.org/library/urllib2.html

Radioactivate answered 2/8, 2012 at 18:9 Comment(0)

Recommended topics

Hot tags