Python requests - 403 forbidden - despite setting `User-Agent` headers

Asked 13/7, 2017 at 16:30 Answered 21/8, 2021 at 14:37

Solved python web-scraping python-requests

import requests
import webbrowser
from bs4 import BeautifulSoup

url = 'https://www.gamefaqs.com'
#headers={'User-Agent': 'Mozilla/5.0'}    
headers ={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}


response = requests.get(url, headers)

response.status_code is returning 403. I can browse the website using firefox/chrome, so It seems to be a coding error.

I can't figure out what mistake I'm making.

Thank you.

Premeditation answered 13/7, 2017 at 16:30 Comment(0)

This works if you make the request through a Session object.

import requests

session = requests.Session()
response = session.get('https://www.gamefaqs.com', headers={'User-Agent': 'Mozilla/5.0'})

print(response.status_code)

Output:

Stipendiary answered 13/7, 2017 at 16:39 Comment(6)

Thanks. What exactly is going on with the Session object that is making the difference? I've never had to make a Session object to scrape a site. – Premeditation 13/7, 2017 at 16:49

@Premeditation The main thing about Session objects is its compatibility with cookies. For all you know, it's possible the site is setting and requesting cookies to be echoed back as a defence against scraping which is probably against its policy. – Stipendiary 13/7, 2017 at 16:51

Cookies. I see. Thank you. – Premeditation 13/7, 2017 at 19:39

I've tried this for another website and it doesn't fix the issue, I still get a 403. – Eczema 6/9, 2020 at 14:59

Same here, I'd like to learn if you've found a solution? @Eczema – Addressee 14/4, 2021 at 19:37

It was a while ago, I can't remember @talha06. Sorry – Eczema 14/4, 2021 at 20:50

Using keyword argument works for me:

import requests
headers={'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.gamefaqs.com', headers=headers)

Ravo answered 7/6, 2018 at 11:25 Comment(0)

Try using a Session.

import requests
session = requests.Session()
response = session.get(url, headers={'user-agent': 'Mozilla/5.0'})
print(response.status_code)

If still the request returns 403 Forbidden (after session object & adding user-agent to headers), you may need to add more headers:

headers = {
    'user-agent':"Mozilla/5.0 ...",
    'accept': '"text/html,application...',
    'referer': 'https://...',
}
r = session.get(url, headers=headers)

In the chrome, Request headers can be found in the Network > Headers > Request-Headers of the Developer Tools. (Press F12 to toggle it.)

reason being, few websites look for user-agent or for presence of specific headers before accepting the request.

Archaism answered 21/8, 2021 at 14:37 Comment(0)

Recommended topics

Hot tags