Make sure you're using user-agent
(headers), otherwise it will return an empty output because Google will block requests eventually. What is my user-agent.
headers = {
"User-Agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
Code and example in the online IDE:
require 'nokogiri'
require 'httparty'
require 'json'
headers = {
"User-Agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
q: "stackoverflow",
num: "100"
}
response = HTTParty.get("https://www.google.com/search",
query: params,
headers: headers)
doc = Nokogiri::HTML(response.body)
data = doc.css(".tF2Cxc").map do |result|
title = result.at_css(".DKV0Md")&.text
link = result.at_css(".yuRUbf a")&.attr("href")
displayed_link = result.at_css(".tjvcx")&.text
snippet = result.at_css(".VwiC3b")&.text
# puts "#{title}#{snippet}#{link}#{displayed_link}\n\n"
{
title: title,
link: link,
displayed_link: displayed_link,
snippet: snippet,
}.compact
end
puts JSON.pretty_generate(data)
--------
=begin
[
{
"title": "Stack for Stack Overflow - Apps on Google Play",
"link": "https://play.google.com/store/apps/details?id=me.tylerbwong.stack&hl=en_US&gl=US",
"displayed_link": "https://play.google.com › store › apps › details",
"snippet": "Stack is powered by Stack Overflow and other Stack Exchange sites. Search and filter through questions to find the exact answer you're looking for!"
}
...
]
=end
Alternatively, you can Google Organic Results API from SerpApi. It's a paid API with a free plan.
The main difference is that there's no need to figuring out how to scrape certain parts of the page. All that needs to be done is just to iterate over a structured JSON string.
require 'google_search_results'
require 'json'
params = {
api_key: ENV["API_KEY"],
engine: "google",
q: "stackoverflow",
hl: "en",
num: "100"
}
search = GoogleSearch.new(params)
hash_results = search.get_hash
data = hash_results[:organic_results].map do |result|
title = result[:title]
link = result[:link]
displayed_link = result[:displayed_link]
snippet = result[:snippet]
{
title: title,
link: link,
displayed_link: displayed_link,
snippet: snippet
}.compact
end
puts JSON.pretty_generate(data)
-------------
=begin
[
{
"title": "Stack Overflow - Home | Facebook",
"link": "https://www.facebook.com/officialstackoverflow/",
"displayed_link": "https://www.facebook.com › Pages › Interest",
"snippet": "Stack Overflow. 519455 likes · 587 talking about this. We are the world's programmer community."
}
...
]
=end
Disclaimer, I work for SerpApi.
at
finds the first occurrence of something as aNode
, andsearch
finds all occurrences, returning aNodeSet
. NodeSet is like an array of Nodes so you can iterate over it. – Dight