I'd like to be able to scrape the "about N results" number for an arbitrary Google Search term. Google is fairly resistant to scrapers so while that might be an option with a bit of work, I'm specifically asking if there's a better way of doing this? Perhaps there's a preexisting API provided by Google that would fulfill this need?
I would not attempt scraping as there are most likely legal ramifications with that, but would use the Google Custom Search API. You'll need an API Key as well as a CX id (This is an id for a custom search engine you'll set up in your Google account)
Once you have access to the API and your CX id, you can submit queries to the cse.list method and get the number you're looking for in the response under totalResults
.
When setting up and customizing your custom search engine you'll have to define the sites you want to search. Fortunately, you can add wildcards like *.com, *.net, etc. Or follow the instructions on this page to search the entire web: https://support.google.com/customsearch/answer/2631040?hl=en
I've included all the links you'll need to get moving on this below. Try out the API List Explorer once you have a CX id. It will give you real time response data that you can check out and play around with.
Google Custom Search API
https://developers.google.com/custom-search/
This is the method/endpoint you'll want to use:
https://developers.google.com/custom-search/json-api/v1/reference/cse/list
cse.list method explorer:
https://developers.google.com/apis-explorer/#p/customsearch/v1/search.cse.list
Set up and manage your custom search engine
https://cse.google.com/cse/manage/all
Note: Results may vary a bit depending how you have your search engine configured. I have a test set up to search the entire web with emphasis on *.com and *.net domains and I'm getting a larger number than what Google shows in the "About N Results". I'm not sure if you need that exact number, but they are describing it as "About" so it can't be entirely accurate number anyway. The point is, with CSE you have a lot of control over how to configure it and you should be able to get very close.
Assuming that's your custom search API, have you tried conditionally removing the property totalResults
from the JSON response body?
you can achieve that by performing a check on the query parameter (lets say q),
if(q.equals("your string")){
var keyName = "totalResults";
var resp = json_encode($response);
delete resp.queries.<APIkey>[keyName];
}
NOTE: The structure to locate the keyName: totalResults has been derived from here
© 2022 - 2024 — McMap. All rights reserved.