How do I prevent Python's urllib(2) from following a redirect

Asked 16/2, 2009 at 20:29 Answered 31/7, 2012 at 16:33

I am currently trying to log into a site using Python however the site seems to be sending a cookie and a redirect statement on the same page. Python seems to be following that redirect thus preventing me from reading the cookie send by the login page. How do I prevent Python's urllib (or urllib2) urlopen from following the redirect?

Irreconcilable answered 16/2, 2009 at 20:29 Comment(3)

Duplicate: #110998 – Rimskykorsakov 16/2, 2009 at 20:56

a similar question: #9891315 – Janssen 28/3, 2012 at 11:28

For readers who don't care about using urllib specificially. requests supports this "out of the box" #110998 – Launder 17/5, 2022 at 10:17

You could do a couple of things:

Build your own HTTPRedirectHandler that intercepts each redirect
Create an instance of HTTPCookieProcessor and install that opener so that you have access to the cookiejar.

This is a quick little thing that shows both

import urllib2

#redirect_handler = urllib2.HTTPRedirectHandler()

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print "Cookie Manip Right Here"
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

    http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor()

opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)

response =urllib2.urlopen("WHEREEVER")
print response.read()

print cookieprocessor.cookiejar

Morell answered 16/2, 2009 at 21:13 Comment(4)

You don't seem to be using redirect_handler = urllib2.HTTPRedirectHandler() in the example at all. Were you going to show a second example? – Skirl 16/8, 2011 at 21:13

You are correct, I'm not using the redirect_handler. Instead, I created my own redirect handler. I will edit to remove. – Morell 23/8, 2011 at 4:38

Why is it you do not need to instantiate the MyHTTPRedirectHandler, but rather pass the class into the build_opener() method? – Habit 9/1, 2012 at 20:10

From the documentation: handlers can be either instances of BaseHandler, or subclasses of BaseHandler (in which case it must be possible to call the constructor without any parameters). Since MyHTTPRedirectHandler doesn't have a constructor with any arguments, I can pass it in as is. – Morell 12/1, 2012 at 1:43

If all you need is stopping redirection, then there is a simple way to do it. For example I only want to get cookies and for a better performance I don't want to be redirected to any other page. Also I hope the code is kept as 3xx. let's use 302 for instance.

class MyHTTPErrorProcessor(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        code, msg, hdrs = response.code, response.msg, response.info()

        # only add this line to stop 302 redirection.
        if code == 302: return response

        if not (200 <= code < 300):
            response = self.parent.error(
                'http', request, response, code, msg, hdrs)
        return response

    https_response = http_response

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), MyHTTPErrorProcessor)

In this way, you don't even need to go into urllib2.HTTPRedirectHandler.http_error_302()

Yet more common case is that we simply want to stop redirection (as required):

class NoRedirection(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        return response

    https_response = http_response

And normally use it this way:

cj = cookielib.CookieJar()
opener = urllib2.build_opener(NoRedirection, urllib2.HTTPCookieProcessor(cj))
data = {}
response = opener.open('http://www.example.com', urllib.urlencode(data))
if response.code == 302:
    redirection_target = response.headers['Location']

Fluidize answered 31/7, 2012 at 16:33 Comment(9)

Just what I needed, and very concise class NoRedirection() - you don't even have to store code, msg, hdrs -- Thanks Alan. – Lapidate 20/9, 2013 at 15:7

You are right! And I removed the line as you suggested. Thanks Xtof. – Fluidize 24/9, 2013 at 2:26

Is it possible to use this approach to get hold of the actual redirect URL? – Brunner 10/7, 2015 at 5:33

@Malvin9000 If you want to get the target of the redirection, then yes, just read response.headers['Location'], you will get it:) – Fluidize 10/7, 2015 at 6:10

@Malvin9000 Not literally using read, you can assign it to a new variable or directly print it out. Let me update the answer so you can see. – Fluidize 10/7, 2015 at 6:16

@AlanDuan Thanks a lot for the edit update, much appreciated. When I print redirection_target I see the URL I'm inserting in opener.open() instead of the new URL that appears in my browser when I cut-and-paste the original URL. Not sure what I'm doing wrong... – Brunner 10/7, 2015 at 6:27

@Malvin9000 most probably it redirects to itself. It happens when the url supports both GET and POST methods, when you POST some data not accepted, it directs back to itself using GET method. To get what exactly happen, you can use developer tools in Chrome or Firefox to trace every step, (call it out via CTRL+SHIFT+I in Chrome, then select Network tab). – Fluidize 10/7, 2015 at 6:34

@AlanDuan This post is pretty much exactly what I'm trying to accomplish, same HTTP header data, etc, trying to get that value of location — but maybe it's not possible using raw requests. – Brunner 10/7, 2015 at 6:44

Let us continue this discussion in chat. – Fluidize 10/7, 2015 at 6:54

urllib2.urlopen calls build_opener() which uses this list of handler classes:

handlers = [ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor]

You could try calling urllib2.build_opener(handlers) yourself with a list that omits HTTPRedirectHandler, then call the open() method on the result to open your URL. If you really dislike redirects, you could even call urllib2.install_opener(opener) to your own non-redirecting opener.

It sounds like your real problem is that urllib2 isn't doing cookies the way you'd like. See also How to use Python to login to a webpage and retrieve cookies for later usage?

Infanta answered 16/2, 2009 at 20:38 Comment(1)

You could try calling urllib2.build_opener(handlers) yourself with a list that omits HTTPRedirectHandler, then call the open() method on the result to open your URL. Well, docs for urllib2.build_opener() say this Instances of the following classes will be in front of the handlers, unless the handlers contain them, instances of them or subclasses of them: ProxyHandler, UnknownHandler, HTTPHandler, HTTPDefaultErrorHandler, HTTPRedirectHandler, FTPHandler, FileHandler, HTTPErrorProcessor. It looks like ommiting HTTPRedirectHandler won't work... – Wife 1/4, 2011 at 17:57

This question was asked before here.

EDIT: If you have to deal with quirky web applications you should probably try out mechanize. It's a great library that simulates a web browser. You can control redirecting, cookies, page refreshes... If the website doesn't rely [heavily] on JavaScript, you'll get along very nicely with mechanize.

Nieshanieto answered 16/2, 2009 at 20:46 Comment(0)

Recommended topics

Hot tags