According to the documentation, neither Content-Disposition
nor its filename
attribute is required. Also, I checked dozens links on the internet and haven't found responses with the Content-Disposition
header. So, in most cases, I wouldn't rely on it much and just retrieve this information from the request URL (note: I'm taking it from req.url
because there could be redirection and we want to get real filename). I used werkzeug
because it looks more robust and handles quoted and unquoted filenames. Eventually, I came up with this solution (works since Python 3.8):
from urllib.parse import urlparse
import requests
import werkzeug
def get_filename(url: str):
try:
with requests.get(url) as req:
if content_disposition := req.headers.get("Content-Disposition"):
param, options = werkzeug.http.parse_options_header(content_disposition)
if param == 'attachment' and (filename := options.get('filename')):
return filename
path = urlparse(req.url).path
name = path[path.rfind('/') + 1:]
return name
except requests.exceptions.RequestException as e:
raise e
I wrote some tests using pytest
and requests_mock
:
import pytest
import requests
import requests_mock
from main import get_filename
TEST_URL = 'https://pwrk.us/report.pdf'
@pytest.mark.parametrize(
'headers,expected_filename',
[
(
{'Content-Disposition': 'attachment; filename="filename.pdf"'},
"filename.pdf"
),
(
# The string following filename should always be put into quotes;
# but, for compatibility reasons, many browsers try to parse unquoted names that contain spaces.
# https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition#directives
{'Content-Disposition': 'attachment; filename=filename with spaces.pdf'},
"filename with spaces.pdf"
),
(
{'Content-Disposition': 'attachment;'},
"report.pdf"
),
(
{'Content-Disposition': 'inline;'},
"report.pdf"
),
(
{},
"report.pdf"
)
]
)
def test_get_filename(headers, expected_filename):
with requests_mock.Mocker() as m:
m.get(TEST_URL, text='resp', headers=headers)
assert get_filename(TEST_URL) == expected_filename
def test_get_filename_exception():
with requests_mock.Mocker() as m:
m.get(TEST_URL, exc=requests.exceptions.RequestException)
with pytest.raises(requests.exceptions.RequestException):
get_filename(TEST_URL)
0c9605301e48beda0f000000.pdf
" (as that is in the request) but fortunately I decided to test it first. And FireFox wants to save it as "Mater Sci Eng B47 (1997) 33.pdf". – Osunacontent-disposition : inline; filename="Mater Sci Eng B47 (1997) 33.pdf"
. FWIW, many PDFs have a Title embedded in them, but not all, and it may not be easy to access if the PDF is in binary form. – Hooey