urllib: Get name of file from direct download link
Asked Answered
F

2

6

Python 3. Probably need to use urllib to do this,

I need to know how to send a request to a direct download link, and get the name of the file it attempts to save.

(As an example, a KSP mod from CurseForge: https://kerbal.curseforge.com/projects/mechjeb/files/2355387/download)

Of course, the file ID (2355387) will be changed. It could be from any project, but always on CurseForge. (If that makes a difference on the way it's downloaded.)

That example link results in the file:

Download Screenshot

How can I return that file name in Python?

Edit: I should note that I want to avoid saving the file, reading the name, then deleting it if possible. That seems like the worst way to do this.

Frauenfeld answered 31/3, 2017 at 22:17 Comment(0)
M
9

Using urllib.request, when you request a response from a url, the response contains a reference to the url you are downloading.

>>> from urllib.request import urlopen    
>>> url = 'https://kerbal.curseforge.com/projects/mechjeb/files/2355387/download'
>>> response = urlopen(url)
>>> response.url
'https://addons-origin.cursecdn.com/files/2355/387/MechJeb2-2.6.0.0.zip'

You can use os.path.basename to get the filename:

>>> from os.path import basename
>>> basename(response.url)
'MechJeb2-2.6.0.0.zip'
Mayest answered 31/3, 2017 at 22:40 Comment(2)
It also seems odd to me that os.path works on a URL. Is this intended or merely an inadvertent benefit?Frauenfeld
See the answers to Get URL path sections. More generally you might want to use a combination of urlparse and posixpath.Mayest
K
4
from urllib import request

url = 'file download link'
filename = request.urlopen(request.Request(url)).info().get_filename()
Kneecap answered 23/2, 2021 at 13:26 Comment(3)
Hello and welcome to SO! While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. Please read the tour, and How do I write a good answer?Sciential
This answer does not always work. For example, I get None for filename when setting url to "openssl.org/source/old/1.1.1/openssl-1.1.1q.tar.gz"Sternforemost
Adding onto my previous comment: urlopen returns a http.client.HTTPResponse. According to the bottom of docs.python.org/3/library/http.client.html#httpmessage-objects, it is implemented using the email.message.Message class (docs.python.org/3/library/…). That's where the get_filename method comes from. It uses the Content-Disposition HTTP header, which may not be always there. That's why it doesn't always work.Sternforemost

© 2022 - 2024 — McMap. All rights reserved.