How do I catch a 404 error in urllib? (python 3)
Asked Answered
U

2

9

I've been reading tens of examples for similar issues, but I can't get any of the solutions I've seen or their variants to run. I'm screen scraping, and I just want to ignore 404 errors (skip the pages). I get

'AttributeError: 'module' object has no attribute 'HTTPError'.

I've tried 'URLError' as well. I've seen the near identical syntax accepted as working answers. Any ideas? Here's what I've got:

import urllib
import datetime
from bs4 import BeautifulSoup

class EarningsAnnouncement:
    def __init__(self, Company, Ticker, EPSEst, AnnouncementDate, AnnouncementTime):
        self.Company = Company
        self.Ticker = Ticker
        self.EPSEst = EPSEst
        self.AnnouncementDate = AnnouncementDate
        self.AnnouncementTime = AnnouncementTime

webBaseStr = 'http://biz.yahoo.com/research/earncal/'
earningsAnnouncements = []
dayVar = datetime.date.today()
for dte in range(1, 30):
    currDay = str(dayVar.day)
    currMonth = str(dayVar.month)
    currYear = str(dayVar.year)
    if (len(currDay)==1): currDay = '0' + currDay
    if (len(currMonth)==1): currMonth = '0' + currMonth
    dateStr = currYear + currMonth + currDay
    webString = webBaseStr + dateStr + '.html'
    try:
        #with urllib.request.urlopen(webString) as url: page = url.read()
        page = urllib.request.urlopen(webString).read()
        soup = BeautifulSoup(page)
        tbls = soup.findAll('table')
        tbl6= tbls[6]
        rows = tbl6.findAll('tr')
        rows = rows[2:len(rows)-1]
        for earn in rows:
            earningsAnnouncements.append(EarningsAnnouncement(earn.contents[0], earn.contents[1],
            earn.contents[3], dateStr, earn.contents[3]))
    except urllib.HTTPError as err:
        if err.code == 404:
            continue
        else:
            raise

    dayVar += datetime.timedelta(days=1)
Utricle answered 20/7, 2013 at 20:14 Comment(0)
D
18

It looks like for urllib (not urllib2) that the exception is urllib.error.HTTPError, not urllib.HTTPError. See the documentation for more information.

Dis answered 20/7, 2013 at 20:22 Comment(2)
Thanks Kyle... but, that yields the error 'module' object has no attribute 'error'Utricle
could you update your answer for urllib3 and python3.x ?Emden
H
1

Do this :

import urllib.error# import 
except urllib.error.URLError as e:# use 'urllib.error.URLError' and not 'urllib.HTTPError'
        print ('Error code: ', e.code)# or what ever u want 
        return e.code
Huntingdonshire answered 6/10, 2022 at 21:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.