The readable way to send a request with many query parameters would be to pass URL parameters as a dictionary:
params = {
'q': 'minecraft', # search query
'gl': 'us', # country where to search from
'hl': 'en', # language
}
requests.get('URL', params=params)
But, in order to get the actual response (output/text/data) that you see in the browser you need to send additional headers, more specifically user-agent which is needed to act as a "real" user visit when bot or browser sends a fake user-agent
string to announce themselves as a different client.
The reason that your request might be blocked is that the default requests
user agent is python-requests
and websites understand that. Check what's your user agent.
You can read more about it in the blog post I wrote about how to reduce the chance of being blocked while web scraping.
Pass user-agent
:
headers = {
'User-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
}
requests.get('URL', headers=headers)
Code and example in the online IDE:
from bs4 import BeautifulSoup
import requests, lxml
headers = {
'User-agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
}
params = {
'q': 'minecraft',
'gl': 'us',
'hl': 'en',
}
html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')
for result in soup.select('.tF2Cxc'):
title = result.select_one('.DKV0Md').text
link = result.select_one('.yuRUbf a')['href']
print(title, link, sep='\n')
Alternatively, you can achieve the same thing by using Google Organic API from SerpApi. It's a paid API with a free plan.
The difference is that you don't have to create it from scratch and maintain it.
Code to integrate:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "tesla",
"hl": "en",
"gl": "us",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results["organic_results"]:
print(result['title'])
print(result['link'])
Disclaimer, I work for SerpApi.