I am currently trying to log into a site using Python however the site seems to be sending a cookie and a redirect statement on the same page. Python seems to be following that redirect thus preventing me from reading the cookie send by the login page. How do I prevent Python's urllib (or urllib2) urlopen from following the redirect?
You could do a couple of things:
- Build your own HTTPRedirectHandler that intercepts each redirect
- Create an instance of HTTPCookieProcessor and install that opener so that you have access to the cookiejar.
This is a quick little thing that shows both
import urllib2
#redirect_handler = urllib2.HTTPRedirectHandler()
class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def http_error_302(self, req, fp, code, msg, headers):
print "Cookie Manip Right Here"
return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)
http_error_301 = http_error_303 = http_error_307 = http_error_302
cookieprocessor = urllib2.HTTPCookieProcessor()
opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)
response =urllib2.urlopen("WHEREEVER")
print response.read()
print cookieprocessor.cookiejar
redirect_handler = urllib2.HTTPRedirectHandler()
in the example at all. Were you going to show a second example? –
Skirl MyHTTPRedirectHandler
, but rather pass the class into the build_opener()
method? –
Habit If all you need is stopping redirection, then there is a simple way to do it. For example I only want to get cookies and for a better performance I don't want to be redirected to any other page. Also I hope the code is kept as 3xx. let's use 302 for instance.
class MyHTTPErrorProcessor(urllib2.HTTPErrorProcessor):
def http_response(self, request, response):
code, msg, hdrs = response.code, response.msg, response.info()
# only add this line to stop 302 redirection.
if code == 302: return response
if not (200 <= code < 300):
response = self.parent.error(
'http', request, response, code, msg, hdrs)
return response
https_response = http_response
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), MyHTTPErrorProcessor)
In this way, you don't even need to go into urllib2.HTTPRedirectHandler.http_error_302()
Yet more common case is that we simply want to stop redirection (as required):
class NoRedirection(urllib2.HTTPErrorProcessor):
def http_response(self, request, response):
return response
https_response = http_response
And normally use it this way:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(NoRedirection, urllib2.HTTPCookieProcessor(cj))
data = {}
response = opener.open('http://www.example.com', urllib.urlencode(data))
if response.code == 302:
redirection_target = response.headers['Location']
class NoRedirection()
- you don't even have to store code, msg, hdrs
-- Thanks Alan. –
Lapidate redirection_target
I see the URL I'm inserting in opener.open()
instead of the new URL that appears in my browser when I cut-and-paste the original URL. Not sure what I'm doing wrong... –
Brunner location
— but maybe it's not possible using raw requests. –
Brunner urllib2.urlopen
calls build_opener()
which uses this list of handler classes:
handlers = [ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor]
You could try calling urllib2.build_opener(handlers)
yourself with a list that omits HTTPRedirectHandler
, then call the open()
method on the result to open your URL. If you really dislike redirects, you could even call urllib2.install_opener(opener)
to your own non-redirecting opener.
It sounds like your real problem is that urllib2
isn't doing cookies the way you'd like. See also How to use Python to login to a webpage and retrieve cookies for later usage?
HTTPRedirectHandler
won't work... –
Wife This question was asked before here.
EDIT: If you have to deal with quirky web applications you should probably try out mechanize. It's a great library that simulates a web browser. You can control redirecting, cookies, page refreshes... If the website doesn't rely [heavily] on JavaScript, you'll get along very nicely with mechanize.
© 2022 - 2024 — McMap. All rights reserved.
requests
supports this "out of the box" #110998 – Launder