The http://www.google.com
URL redirects:
$ curl -D - -X HEAD http://www.google.com
HTTP/1.1 302 Found
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Location: http://www.google.co.uk/?gfe_rd=cr&ei=A8sXVZLOGvHH8ge1jYKwDQ
Content-Length: 261
Date: Sun, 29 Mar 2015 09:50:59 GMT
Server: GFE/2.0
Alternate-Protocol: 80:quic,p=0.5
and urllib.request
has followed the redirect, issuing a GET request to that new location:
>>> import urllib.request
>>> req = urllib.request.Request("http://www.google.com", method="HEAD")
>>> resp = urllib.request.urlopen(req)
>>> resp.url
'http://www.google.co.uk/?gfe_rd=cr&ei=ucoXVdfaJOTH8gf-voKwBw'
You'd have to build your own handler stack to prevent this; the HTTPRedirectHandler
isn't smart enough to not handle a redirect when issuing a HEAD
method action. Adapting the example from Alan Duan from How do I prevent Python's urllib(2) from following a redirect to Python 3 would give you:
import urllib.request
class NoRedirection(urllib.request.HTTPErrorProcessor):
def http_response(self, request, response):
return response
https_response = http_response
opener = urllib.request.build_opener(NoRedirection)
req = urllib.request.Request("http://www.google.com", method="HEAD")
resp = opener.open(req)
You'd be better of using the requests
library; it explicitly sets allow_redirects=False
when using the requests.head()
or requests.Session().head()
callables, so there you can see the original result:
>>> import requests
>>> requests.head('http://www.google.com')
<Response [302]>
>>> _.headers['Location']
'http://www.google.co.uk/?gfe_rd=cr&ei=FcwXVbepMvHH8ge1jYKwDQ'
and even if redirection is enabled the response.history
list gives you access to the intermediate requests, and requests
uses the correct method for the redirected call too:
>>> response = requests.head('http://www.google.com', allow_redirects=True)
>>> response.url
'http://www.google.co.uk/?gfe_rd=cr&ei=8e0XVYfGMubH8gfJnoKoDQ'
>>> response.history
[<Response [302]>]
>>> response.history[0].url
'http://www.google.com/'
>>> response.request.method
'HEAD'