The urllib.request.urlretrieve()
use inside urllib.request.urlopen()
(at least in Python 3). So you can use same way how you can influence behavior of urlopen
.
When urlopen(params)
is invoked it actually first looks at the special global variable urllib.request._opener
and if it is None
then the urlopen
set the variable with default set of openers otherwise it will keep it as it. In the next step it will call urllib.request._opener.open(<urlopen_params>)
(in next sections I will refer urllib.request._opener
only as opener
).
The opener.open()
contains list of handlers for different protocols. When the opener.open()
is called then it will do this actions:
- Creates from URL
urllib.request.Request
object (or if you provide directly the Request
it will just use it).
- From the
Request
object is extracted the protocol (it deduced from URL scheme).
- Based on the protocol it will try lookup and use those methods:
protocol_request
(e.g. http_request
) - it used for pre-process the request before the connection is opened.
protocol_open
- actually creates connection with the remote server
protocol_response
- process the response from the server
- for other methods look at the Python's documentation
For your own opener you have to do those 3 steps:
- Create own handler
- Build list of handlers contains your custom handler (function
urllib.request.build_opener
)
- Install the new opener into
urllib.request._opener
(function urllib.request.install_opener
)
The urllib.request.build_opener
creates opener which contains your custom handler and add default openers except handlers from which is your custom handler inherited.
So for adding custom header you can write something like this:
import urllib.request as req
class MyHTTP(req.HTTPHandler):
def http_request(self, req):
req.headers["MyHeader"] = "Content of my header"
return super().http_request(req)
opener = req.build_opener(MyHTTP())
req.install_opener(opener)
From this point when you call urllib.request.urlretrieve()
or anything which is using the urlopen()
it will use for HTTP communication your handler. When you want to get back to default handlers you can just call:
import urllib.request as req
req.install_opener(req.build_opener())
To be honest I don't know if it is better/cleaner solution then yours but it uses prepared mechanisms in the urllib
.