How to get the code of the headers through urllib?
In Python, how do I use urllib to see if a website is 404 or 200?
Asked Answered
The getcode() method (Added in python2.6) returns the HTTP status code that was sent with the response, or None if the URL is no HTTP URL.
>>> a=urllib.urlopen('http://www.google.com/asdfsf')
>>> a.getcode()
404
>>> a=urllib.urlopen('http://www.google.com/')
>>> a.getcode()
200
In python 3.4, if there is a 404,
urllib.request.urlopen
returns a urllib.error.HTTPError
. –
Athos Doesn't work in python 2.7. If the HTTP returns 400, an exception is thrown –
Dorothi
You can use urllib2 as well:
import urllib2
req = urllib2.Request('http://www.python.org/fish.html')
try:
resp = urllib2.urlopen(req)
except urllib2.HTTPError as e:
if e.code == 404:
# do something...
else:
# ...
except urllib2.URLError as e:
# Not an HTTP-specific error (e.g. connection refused)
# ...
else:
# 200
body = resp.read()
Note that HTTPError
is a subclass of URLError
which stores the HTTP status code.
Is the second
else
a mistake? –
Weichsel @NadavB The exception object 'e' will look like a response object. That is, it's file-like and you can 'read' the payload from it. –
Tetroxide
For Python 3:
import urllib.request, urllib.error
url = 'http://www.google.com/asdfsf'
try:
conn = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
# Return code error (e.g. 404, 501, ...)
# ...
print('HTTPError: {}'.format(e.code))
except urllib.error.URLError as e:
# Not an HTTP-specific error (e.g. connection refused)
# ...
print('URLError: {}'.format(e.reason))
else:
# 200
# ...
print('good')
For URLError
print(e.reason)
could be used. –
Sines What about
http.client.HTTPException
? –
Kendy How can I check for a 301 or 302? –
Rhizome
import urllib2
try:
fileHandle = urllib2.urlopen('http://www.python.org/fish.html')
data = fileHandle.read()
fileHandle.close()
except urllib2.URLError, e:
print 'you got an error with the code', e
TIMEX is interested in grabbing the http request code (200, 404, 500, etc) not a generic error thrown by urllib2. –
Inearth
© 2022 - 2024 — McMap. All rights reserved.
from urllib.request import urlopen
. – Normalcy