How to get HTTP return code from python urllib's urlopen?
Asked Answered
U

2

5

I have the following code:

f = urllib.urlopen(url)
html = f.read()

I would like to know the HTTP status code (HTTP 200, 404 etc) that comes from opening the url above.

Anybody knows how it can be done?

P.S. I use python 2.5.

Thanks!!!

Unto answered 10/2, 2013 at 9:1 Comment(1)
What is an HTML return code? Do you mean the HTTP status?Wail
A
12

You can use the .getcode() method of the object returned by urlopen()

url = urllib.urlopen('http://www.stackoverflow.com/')
code = url.getcode()
Alda answered 10/2, 2013 at 9:6 Comment(1)
Maybe it's because I use python 2.5, but I get the following error message: AttributeError: addinfourl instance has no attribute 'getcode'Unto
N
3

getcode() was only added in Python 2.6. As far as I know, there is no way to get the status code from the request itself in 2.5 but FancyURLopener provides a set of functions which get called on certain error codes - you could potentially use that to save a status code somewhere. I subclassed it to tell me when a 404 occurred

import urllib

class TellMeAbout404s(urllib.FancyURLopener):
    def http_error_404(self, url, fp, errcode, errmsg, headers, data=None):
        print("==== Got a 404")

opener = TellMeAbout404s()
f = opener.open("http://www.google.com/sofbewfwl")
print(f.info())

info() provides the HTTP headers but not the status code.

Necessitous answered 10/2, 2013 at 9:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.