1. Deprecation problem
In Python 3.7, I download a big file from a URL
using the urllib.request.urlretrieve(..)
function. In the documentation (https://docs.python.org/3/library/urllib.request.html) I read the following just above the urllib.request.urlretrieve(..)
docs:
Legacy interface
The following functions and classes are ported from the Python 2 module urllib (as opposed to urllib2). They might become deprecated at some point in the future.
2. Searching an alternative
To keep my code future-proof, I'm on the lookout for an alternative. The official Python docs don't mention a specific one, but it looks like urllib.request.urlopen(..)
is the most straightforward candidate. It's at the top of the docs page.
Unfortunately, the alternatives - like urlopen(..)
- don't provide the reporthook
argument. This argument is a callable you pass to the urlretrieve(..)
function. In turn, urlretrieve(..)
calls it regularly with the following arguments:
- block nr.
- block size
- total file size
I use it to update a progressbar. That's why I miss the reporthook
argument in alternatives.
3. urlretrieve(..) vs urlopen(..)
I discovered that urlretrieve(..)
simply uses urlopen(..)
. See the request.py
code file in the Python 3.7 installation (Python37/Lib/urllib/request.py):
_url_tempfiles = []
def urlretrieve(url, filename=None, reporthook=None, data=None):
"""
Retrieve a URL into a temporary location on disk.
Requires a URL argument. If a filename is passed, it is used as
the temporary file location. The reporthook argument should be
a callable that accepts a block number, a read size, and the
total file size of the URL target. The data argument should be
valid URL encoded data.
If a filename is passed and the URL points to a local resource,
the result is a copy from local file to new file.
Returns a tuple containing the path to the newly created
data file as well as the resulting HTTPMessage object.
"""
url_type, path = splittype(url)
with contextlib.closing(urlopen(url, data)) as fp:
headers = fp.info()
# Just return the local path and the "headers" for file://
# URLs. No sense in performing a copy unless requested.
if url_type == "file" and not filename:
return os.path.normpath(path), headers
# Handle temporary file setup.
if filename:
tfp = open(filename, 'wb')
else:
tfp = tempfile.NamedTemporaryFile(delete=False)
filename = tfp.name
_url_tempfiles.append(filename)
with tfp:
result = filename, headers
bs = 1024*8
size = -1
read = 0
blocknum = 0
if "content-length" in headers:
size = int(headers["Content-Length"])
if reporthook:
reporthook(blocknum, bs, size)
while True:
block = fp.read(bs)
if not block:
break
read += len(block)
tfp.write(block)
blocknum += 1
if reporthook:
reporthook(blocknum, bs, size)
if size >= 0 and read < size:
raise ContentTooShortError(
"retrieval incomplete: got only %i out of %i bytes"
% (read, size), result)
return result
4. Conclusion
From all this, I see three possible decisions:
I keep my code unchanged. Let's hope the
urlretrieve(..)
function won't get deprecated anytime soon.I write myself a replacement function behaving like
urlretrieve(..)
on the outside and usingurlopen(..)
on the inside. Actually, such function would be a copy-paste of the code above. It feels unclean to do that - compared to using the officialurlretrieve(..)
.I write myself a replacement function behaving like
urlretrieve(..)
on the outside and using something entirely different on the inside. But hey, why would I do that?urlopen(..)
is not deprecated, so why not use it?
What decision would you take?
urlretrieve()
to your code and use it at once or only when there is no originalurllib.request.urlretrieve()
– Illuminance