Get request to Google Search
Asked Answered
A

4

14

I'm trying to get HTML with search results from Google. With sending GET request for example to:

https://www.google.ru/?q=1111

But if in browser all is ok, when I'm trying to use it with curl or to get source with "View source" in Google, there is only some Javascript code, no search result. Is that some type of protection? What can I do?

Ariew answered 26/3, 2017 at 20:46 Comment(2)
You can trick the system by removing the curl user Agent.Winstonwinstonn
Search results are at google.ru/search?q=1111. google.ru/?q=1111 is a Google Search homepage with 1111 in search box at the center. Please try to view page source of google.ru/search?q=1111.Geyser
F
0

You can load it in the browser and then scrape results via Javascript.

Or you can use Google API, but seems that it requires payment if you will request it more then 100 times per day.

Faceless answered 27/3, 2017 at 10:3 Comment(3)
Your method will get blocked pretty quick. Google will present a "we want to make sure your not a robot ..." screen with captcha you must solve in order to continue searching.Convenient
@BrianSmith, yes, of course it will. But only one time per all pages.Faceless
@Visitant One time per one query only (before results comes and then will be no captcha when you clicks pages). Every query. All as I said.Faceless
C
11

You now have to use the Google Search API to make your GET requests.

All other methods have been blocked.

Convenient answered 26/3, 2017 at 20:59 Comment(2)
the problem is it must made against a specific website.Winstonwinstonn
Note: This has a cost above x requests.Caddell
G
4

The page from your question is the Google Search page with the input field.

Screenshot of https://www.google.ru/?q=1111

The search results page is this one:

https://www.google.ru/search?q=1111

Rotate proxies and user agents, and delay similar requests to get the HTML from Google Search results pages with fewer amount of bans.

Or use SerpApi to access HTML and the extracted data from it. It has a free trial.

curl -s 'https://serpapi.com/search?q=coffee'

Output

{
  // Omitted

  "organic_results": [
    {
      "position": 1,
      "title": "Coffee - Wikipedia",
      "link": "https://en.wikipedia.org/wiki/Coffee",
      "displayed_link": "en.wikipedia.org › wiki › Coffee",
      "snippet": "Coffee is a brewed drink prepared from roasted coffee beans, the seeds of berries from certain Coffea species. When coffee berries turn from green to bright red ...",
      "sitelinks": {
        "expanded": [
          {
            "title": "History",
            "link": "https://en.wikipedia.org/wiki/History_of_coffee",
            "snippet": "The history of coffee dates back to the 15th century, and possibly ..."
          },
          {
            "title": "International Coffee Day",
            "link": "https://en.wikipedia.org/wiki/International_Coffee_Day",
            "snippet": "International Coffee Day (1 October) is an occasion that is ..."
          },
          {
            "title": "List of coffee drinks",
            "link": "https://en.wikipedia.org/wiki/List_of_coffee_drinks",
            "snippet": "Milk coffee - Nitro cold brew coffee - List of coffee dishes - ..."
          },
          {
            "title": "Portal:Coffee",
            "link": "https://en.wikipedia.org/wiki/Portal:Coffee",
            "snippet": "Coffee is a brewed drink prepared from roasted coffee beans, the ..."
          },
          {
            "title": "Coffee bean",
            "link": "https://en.wikipedia.org/wiki/Coffee_bean",
            "snippet": "A coffee bean is a seed of the Coffea plant and the source for ..."
          },
          {
            "title": "Geisha",
            "link": "https://en.wikipedia.org/wiki/Geisha_(coffee)",
            "snippet": "Geisha coffee, sometimes referred to as Gesha coffee, is a type of ..."
          }
        ],
        "list": [
          {
            "date": "Color‎: ‎Black, dark brown, light brown, beige"
          }
        ]
      },
      "rich_snippet": {
        "bottom": {
          "detected_extensions": {
            "introduced_th_century": 15
          },
          "extensions": [
            "Introduced‎: ‎15th century",
            "Color‎: ‎Black, dark brown, light brown, beige"
          ]
        }
      },
      "cached_page_link": "https://webcache.googleusercontent.com/search?q=cache:U6oJMnF-eeUJ:https://en.wikipedia.org/wiki/Coffee+&cd=2&hl=sv&ct=clnk&gl=se",
      "related_pages_link": "https://www.google.se/search?gl=se&hl=sv&q=related:https://en.wikipedia.org/wiki/Coffee+coffee&sa=X&ved=2ahUKEwjJ9p2p_KXuAhVlRN8KHf22D8wQHzABegQIAhAJ"
    }
  },

  // ...
}

Disclaimer: I work at SerpApi.

Geyser answered 18/1, 2021 at 17:13 Comment(1)
reported for commercial advertisementSquirm
V
2

To add a bit more sauce to the answers as they are not correct and do not even respond to your problem.

First of all, it's perfectly legal to scrape Google as long as you do not harm their service through it (DoS-like).
Also the methods have not been blocked, it's just not that simple.

The speed depends on your methods, it does not have to be very slow..
You can scrape ten thousands of keyword pages in a minute if needed.

You will find a better answer to the topic here: Is it ok to scrape data from Google results?

Your problem with curl comes indeed from protection, Google does not allow automated access and it has a very sophisticated set of detection algorithms.
They go from simple user agent checks (that's what stopped you directly) up to artificial intelligence that tries to detect unusual queries or related queries.

Visitant answered 31/3, 2017 at 15:16 Comment(0)
F
0

You can load it in the browser and then scrape results via Javascript.

Or you can use Google API, but seems that it requires payment if you will request it more then 100 times per day.

Faceless answered 27/3, 2017 at 10:3 Comment(3)
Your method will get blocked pretty quick. Google will present a "we want to make sure your not a robot ..." screen with captcha you must solve in order to continue searching.Convenient
@BrianSmith, yes, of course it will. But only one time per all pages.Faceless
@Visitant One time per one query only (before results comes and then will be no captcha when you clicks pages). Every query. All as I said.Faceless

© 2022 - 2024 — McMap. All rights reserved.