Python requests - 403 forbidden - despite setting `User-Agent` headers
Asked Answered
P

3

11
import requests
import webbrowser
from bs4 import BeautifulSoup

url = 'https://www.gamefaqs.com'
#headers={'User-Agent': 'Mozilla/5.0'}    
headers ={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}


response = requests.get(url, headers)

response.status_code is returning 403. I can browse the website using firefox/chrome, so It seems to be a coding error.

I can't figure out what mistake I'm making.

Thank you.

Premeditation answered 13/7, 2017 at 16:30 Comment(0)
S
8

This works if you make the request through a Session object.

import requests

session = requests.Session()
response = session.get('https://www.gamefaqs.com', headers={'User-Agent': 'Mozilla/5.0'})

print(response.status_code)

Output:

200
Stipendiary answered 13/7, 2017 at 16:39 Comment(6)
Thanks. What exactly is going on with the Session object that is making the difference? I've never had to make a Session object to scrape a site.Premeditation
@Premeditation The main thing about Session objects is its compatibility with cookies. For all you know, it's possible the site is setting and requesting cookies to be echoed back as a defence against scraping which is probably against its policy.Stipendiary
Cookies. I see. Thank you.Premeditation
I've tried this for another website and it doesn't fix the issue, I still get a 403.Eczema
Same here, I'd like to learn if you've found a solution? @EczemaAddressee
It was a while ago, I can't remember @talha06. SorryEczema
R
3

Using keyword argument works for me:

import requests
headers={'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.gamefaqs.com', headers=headers)
Ravo answered 7/6, 2018 at 11:25 Comment(0)
A
2

Try using a Session.

import requests
session = requests.Session()
response = session.get(url, headers={'user-agent': 'Mozilla/5.0'})
print(response.status_code)

If still the request returns 403 Forbidden (after session object & adding user-agent to headers), you may need to add more headers:

headers = {
    'user-agent':"Mozilla/5.0 ...",
    'accept': '"text/html,application...',
    'referer': 'https://...',
}
r = session.get(url, headers=headers)

In the chrome, Request headers can be found in the Network > Headers > Request-Headers of the Developer Tools. (Press F12 to toggle it.)

reason being, few websites look for user-agent or for presence of specific headers before accepting the request.

Archaism answered 21/8, 2021 at 14:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.