Is there an easy way to request a URL in python and NOT follow redirects?

Asked 21/9, 2008 at 7:49 Answered 3/2, 2013 at 22:42

154

Looking at the source of urllib2 it looks like the easiest way to do it would be to subclass HTTPRedirectHandler and then use build_opener to override the default HTTPRedirectHandler, but this seems like a lot of (relatively complicated) work to do what seems like it should be pretty simple.

Euphorbia answered 21/9, 2008 at 7:49 Comment(0)

289

Here is the Requests way:

import requests
r = requests.get('http://github.com', allow_redirects=False)
print(r.status_code, r.headers['Location'])

Annotate answered 3/2, 2013 at 22:42 Comment(7)

Then look at r.headers['Location'] to see where it would have sent you – Footgear 12/1, 2017 at 16:43

Note that it seems that Requests will normalize Location to location. – Secret 12/5, 2017 at 1:36

@Secret requests allows you to access headers both in the canonical form and in lowercase. See docs.python-requests.org/en/master/user/quickstart/… – Annotate 12/5, 2017 at 7:21

As of 2019 in Python 3, this no longer appears to work for me. (I get a key dict error.) – Sentient 15/8, 2019 at 0:19

Check r.status_code if it is not 301 there might have been another error. The Location header is only available for redirects. Use dict.get if you want to avoid KeyError on optional keys. – Hospitium 20/1, 2021 at 15:0

TypeError: request() got an unexpected keyword argument 'max_redirects' – Penthea 7/5, 2021 at 10:0

Thanks, it's useful for my scenario. In this case I will be able to set the cookie for the domain to be redirected. – Coldhearted 5/11, 2023 at 10:7

Dive Into Python has a good chapter on handling redirects with urllib2. Another solution is httplib.

>>> import httplib
>>> conn = httplib.HTTPConnection("www.bogosoft.com")
>>> conn.request("GET", "")
>>> r1 = conn.getresponse()
>>> print r1.status, r1.reason
301 Moved Permanently
>>> print r1.getheader('Location')
http://www.bogosoft.com/new/location

Omnipotent answered 21/9, 2008 at 8:33 Comment(2)

Everybody who comes here from google, please note that the up to date way to go is this one: https://mcmap.net/q/156202/-is-there-an-easy-way-to-request-a-url-in-python-and-not-follow-redirects The requests library will save you a lot of headache. – Bertine 5/5, 2014 at 2:36

The link to "Dive Into Python" is dead. – Haematogenesis 5/3, 2019 at 13:47

This is a urllib2 handler that will not follow redirects:

class NoRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        infourl = urllib.addinfourl(fp, headers, req.get_full_url())
        infourl.status = code
        infourl.code = code
        return infourl
    http_error_300 = http_error_302
    http_error_301 = http_error_302
    http_error_303 = http_error_302
    http_error_307 = http_error_302

opener = urllib2.build_opener(NoRedirectHandler())
urllib2.install_opener(opener)

Enchiridion answered 18/3, 2011 at 13:33 Comment(1)

I'm unit testing an API and dealing with a login method that redirects to a page I don't care about, but doesn't send the desired session cookie with the response to the redirect. This is exactly what I needed for that. – Henigman 11/2, 2014 at 23:41

The redirections keyword in the httplib2 request method is a red herring. Rather than return the first request it will raise a RedirectLimit exception if it receives a redirection status code. To return the inital response you need to set follow_redirects to False on the Http object:

import httplib2
h = httplib2.Http()
h.follow_redirects = False
(response, body) = h.request("http://example.com")

Mogador answered 14/5, 2012 at 16:45 Comment(0)

i suppose this would help

from httplib2 import Http
def get_html(uri,num_redirections=0): # put it as 0 for not to follow redirects
conn = Http()
return conn.request(uri,redirections=num_redirections)

Betake answered 21/9, 2008 at 13:51 Comment(0)

The shortest way however is

class NoRedirect(urllib2.HTTPRedirectHandler):
    def redirect_request(self, req, fp, code, msg, hdrs, newurl):
        pass

noredir_opener = urllib2.build_opener(NoRedirect())

Fetch answered 29/2, 2012 at 5:26 Comment(4)

How is this the shortest way? It doesn't even contain the import or the actual request. – Annotate 9/5, 2013 at 18:49

I already was going to post this solution and was quite surprised to find this answer at the bottom. It is very concise and should be the top answer in my opinion. – Yautia 21/1, 2015 at 1:55

Moreover, it gives you more freedom, this way it's possible to control which URLs to follow. – Yautia 21/1, 2015 at 2:14

I confirm, this is the easist way. A short remark for those who want to debug. Do not forget that you may set multiples handlers when bullding the opener like : opener = urllib.request.build_opener(debugHandler, NoRedirect()) where debugHandler=urllib.request.HTTPHandler() and debugHandler.set_http_debuglevel (1). In the end: urllib.request.install_opener(opener) – Oyer 13/1, 2020 at 13:5

I second olt's pointer to Dive into Python. Here's an implementation using urllib2 redirect handlers, more work than it should be? Maybe, shrug.

import sys
import urllib2

class RedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_301(self, req, fp, code, msg, headers):  
        result = urllib2.HTTPRedirectHandler.http_error_301( 
            self, req, fp, code, msg, headers)              
        result.status = code                                 
        raise Exception("Permanent Redirect: %s" % 301)

    def http_error_302(self, req, fp, code, msg, headers):
        result = urllib2.HTTPRedirectHandler.http_error_302(
            self, req, fp, code, msg, headers)              
        result.status = code                                
        raise Exception("Temporary Redirect: %s" % 302)

def main(script_name, url):
   opener = urllib2.build_opener(RedirectHandler)
   urllib2.install_opener(opener)
   print urllib2.urlopen(url).read()

if __name__ == "__main__":
    main(*sys.argv)

Repudiation answered 21/9, 2008 at 11:31 Comment(1)

Looks wrong... This code does actually follow the redirects (by calling the original handler, thus issuing an HTTP request), and then raise an exception – Biogenesis 18/3, 2011 at 12:40

Recommended topics

Hot tags