python 3, errorhandling urllib requests

Asked 18/8, 2012 at 23:50 Answered 20/10, 2017 at 13:16

from difflib import *
import urllib.request,urllib.parse,urllib.error
from urllib.parse import unquote
import time
import pdb

try:
    file2 = urllib.request.Request('site goes here')
    file2.add_header("User-Agent", 'Opera/9.61 (Windows NT 5.1; U; en) Presto/2.1.1')
    ResponseData = urllib.request.urlopen(file2).read().decode("utf8", 'ignore')
except urllib.error.URLError as e: print('http'); ResponseData = ''
except socket.error as e: ResponseData = ''
except socket.timeout as e: ResponseData = ''
except UnicodeEncodeError as e: ResponseData = ''
except http.client.BadStatusLine as e: ResponseData = ''
except http.client.IncompleteRead as e: ResponseData = ''
except urllib.error.HTTPError as e: ResponseData = ''

Hi, when I run the following code on a page containing errors such as 'Microsoft VBScript runtime error' ... the request fails and returns as urllib.error.URLError ... even though the page contains plenty of other code. How can I return ALL the html from the page and not just the exception error. I would like to keep my current code as much as possible (if that is possible). Thanks

Sternmost answered 18/8, 2012 at 23:50 Comment(0)

thank you, I have solved the problem

except urllib.error.URLError as e: ResponseData = e.read().decode("utf8", 'ignore')

Sternmost answered 19/8, 2012 at 11:44 Comment(0)

URLError has a 'reason' property, so you can call:

except urllib.error.URLError as e: ResponseData = e.reason

(For example, this would be 'Forbidden').

You should also be careful with catching the subclass of errors before their superclass. In your example, this would mean putting HTTPError before URLError. Otherwise, the subclass will never get caught.

Yseulta answered 20/10, 2017 at 13:16 Comment(0)

Recommended topics

Hot tags