How to get round the HTTP Error 403: Forbidden with urllib.request using Python 3
Asked Answered
A

2

6

Hi not every time but sometimes when trying to gain access to the LSE code I am thrown the every annoying HTTP Error 403: Forbidden message.

Anyone know how I can overcome this issue only using standard python modules (so sadly no beautiful soup).

import urllib.request

url = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"
infile = urllib.request.urlopen(url) # Open the URL
data = infile.read().decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1

print(data) # Print the data to the screen

However every now and then this is the error I am shown:

Traceback (most recent call last):
  File "/home/ubuntu/workspace/programming_practice/Assessment/Summative/removingThe403Error.py", line 5, in <module>
    webpage = urlopen(req).read().decode('ISO-8859-1')
  File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 469, in open
    response = meth(req, response)
  File "/usr/lib/python3.4/urllib/request.py", line 579, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.4/urllib/request.py", line 507, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 587, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden


Process exited with code: 1

Link to a list of all the modules that are okay: https://docs.python.org/3.4/py-modindex.html

Many thanks in advance.

Alverson answered 17/3, 2017 at 16:59 Comment(2)
Just wondering, did you find a solution to this?Arsenical
check this #16627727Kara
K
15

This is probably due to mod_security. You need to spoof by opening the URL as a browser, not as python urllib.

Here, I corrected your code:

import urllib.request

url = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"

# Open the URL as Browser, not as python urllib
page=urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'}) 
infile=urllib.request.urlopen(page).read()
data = infile.decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1

print(data) # Print the data to the screen

Next, you can use BeautifulSoup to scrape the HTML.

Kara answered 3/3, 2018 at 15:23 Comment(0)
A
1

You are being rate limited it seems. Try putting a sleep in and retrying. For example:

import urllib
import urllib.request
from time import sleep

LSE_URL = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"
WAIT_PERIOD = 15

def stock_data_reader():
    stock_data = get_stock_data()
    while True:
        if not stock_data:
            sleep(WAIT_PERIOD) # sleep for a while until next retry
            stock_data = get_stock_data()                
        else:
            break

    print(stock_data) # do something with stock data



def get_stock_data():
    try:
        infile = urllib.request.urlopen(LSE_URL) # Open the URL
    except urllib.error.HTTPError as http_err:
        print("Error: %s" % http_err)
        return None
    else:
        data = infile.read().decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1
        return data


stock_data_reader()
Arsenical answered 17/3, 2017 at 17:28 Comment(5)
Many thanks! Although is there any way of doing this wihout using excepts? I am not 100% I am aloud to use that.Alverson
Nope confirmed not aloud to use exepts - sorry, is there any other way of doing this?Alverson
Can you use the requests library (docs.python-requests.org/en/master) insteadof urllib? I've not encountered a 403 error using it.Arsenical
thank you for your comment but inforntuatlly the only modules we are aloud to use are docs.python.org/3.4/py-modindex.html which the request lib is not part of :/Alverson
I'm out of ideas unfortunately. Can you call cli tools like curl by any chance?Arsenical

© 2022 - 2024 — McMap. All rights reserved.