What are the alternatives now that the Google web search API has been deprecated? [closed]
Asked Answered
P

10

346

Google Web Search API has been deprecated and replaced with Custom Search API (see http://code.google.com/apis/websearch/).

I wanted to search the whole web but it looks like with the new API only custom sites can be searched.

Is there a way to search the whole web programmatically? I was able to query the old API using JSON from a Java program.

Phelan answered 2/11, 2010 at 23:18 Comment(3)
I've been using an alternative google search api. It is super easy to use.Koser
There is also SerpApi. That's a solid solution for Google search and other engines.Discography
serphouse.com is a great solution to get data using API.Stoller
A
41

You could just send them through like a browser does, and then parse the html, that is what I have always done, even for things like Youtube.

Aeolian answered 2/11, 2010 at 23:21 Comment(33)
I really need a proper API call as I'm intending making many calls.Phelan
i'm told that googles terms of service forbid spidering...Earvin
From the TOS: "You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers)..."Ectoblast
Shabby, wouldn't on any large scale. Maybe if the program is for personal use...Horatius
@Aeolian Read the tooltip on the "downvote" button; that's why. Also, because the suggestion isn't allowed by google's TOS.Parasang
@Hugo the answer is useful because it does what was asked and I AM STILL getting downvoted for an answer that was accepted, that works, that is useful, and it is the askser's responsibility to decide on google TOS not mine.Aeolian
@Aeolian If the answer is useful or not is subjective. I did not find it useful having the same question as the op, since it's neither a clean solution, or something that the TOS allows.Parasang
@Hugo no it isn't subjective or at least not in such a degree you suggest, it is useful if it answers the question in a viable way, TOS violations are something to be weighted but not something that makes something wholly useless.Aeolian
"Violate the terms of service with a service provider" is never a good advice. Parsing webpages is something that breaks from one day to the next without warning, this is awful advice - that's the reason it was downvotes more that it was upvoted.Parasang
I don't recall telling them to break the TOS, I gave them a valid answer that was accepted as the best and it is their choice to do what they want with that information.Aeolian
Yes it breaks the terms of service but personally I wouldn't worry about that. Google can handle a little bit of scraping, after all they have made a fortune scraping other peoples sites.Darren
Come on people. Don't be so naive. Google cannot force that ToS down your throat. In order to violate a ToS you must first agree with it (in writing, or by clicking a button like 'Yes, I accept the terms'). Think at this: I put a ToS on my web page that every person that visits that page has to give me $10000. Can I enforce this ToS on my visitors? Will the have to may me immediately.Otranto
@Altar they can still block your IP ;) Ever seen a captcha in Google search? Some people have.Leatherneck
@Altar This's simply untrue. If your program is running on dedicated server, it certainly has a static IP. Besides, having dynamic address still means that you have to reconnect manually to obtain a new one.Leatherneck
@Altar Just saying "come on" doesn't magically dispel all barriers. You have to stay within the limits of the law.Junji
@WGH-most router today have an option to retrieve a new IP at midnight.Otranto
@Altar You are right, but finally you could infringe a law like en.wikipedia.org/wiki/Sui_generis_database_right or in Germany like dejure.org/gesetze/StGB/303b.html So it depends on your country and laws and of course it depends on the laws of the country where google is located at. But finally its much easier for google to ban ips. And of course you could reconnect and obtain a new ip as often as you want, but it could be possible that google uses geo databases to block your region much more (e.g. if you search 10x times in 5 minutes) often than others.Stairhead
No, you can't enforce a ToS against random web surfers. However, creating a program to scrape a web page shows clear intent and the skill required to do so would put you in a higher class of "reasonable person". You might not lose a criminal lawsuit but probably would lose a civil lawsuit. IANAL. Ref: Aaron Swartz.Adaxial
-1 @Zimm3r, you said you provided a "valid answer", but I disagree. I don't consider it a valid answer when it requires the use of a web service, while specifically breaking their T.O.S. Your solution cannot be used without violating Google's Terms of Use, therefore is not really a valid answer in my opinion. It's like someone telling you they need money for groceries, and you suggesting they rob the bank. Sure, technically it is an option, but not one that is likely to work.Sievers
The usefulness of the answer does not mean 'always applicable'. The Google Terms of Service could change - they have already after all. If you need a small amount of files, you are not hurting big G.Barque
Scraping the webpage has these disadvantages: (1) Google doesn't like it - you might face IP ban, captchas and other obstacles. (2) The HTML code of the webpage changes frequently - you will end up fixing your code again and again in your long-term projects. (3) The API can possibly give you more metadata about the search results than the webpage. I downvoted this answer. But I'm not any kind of law nazi. This approach is simply not good for the reasons above.Faeroese
@ændrük that part about automated means is gone from their TOS since March 2012.Assimilative
@Assimilative it still break the terms: "don’t interfere with our Services or try to access them using a method other than the interface and the instructions that we provide."Spruce
BTW: The reason why Google is so adamant about preventing scraping is not for the reasons you think: It is not because it might cost bandwidth—which is cheap. It is because one of Google's most valuable assets, is that its query log is one of the most potent insights into the collective consciousness. Being polluted by mechanized queries would make it worthless, so they are investing all their efforts to dithering scraping done in a way to pollute that data set.Gambrel
@AndreFigueiredo don't [...] access [our Services] using a method other than the interface and the instructions that we provide => a web crawler is using the interface and the instructions that they provide. It just does so by automated means instead of manually, so a web crawler is absolutely compliant with these ToS (at least, with this sentence you quoted).Treasury
@Treasury that's a fair point, I'm not savvy about laws and web crawlers, but my guess is that bots and raw HTTP requests would not be compliant to an accepted interface, versus Selenium for example :P. And, instructions they provide to access their services would not include automated requests - scrapping. Correct me if I'm wrong.Spruce
That said, they have changed their entire TOS, new says: we reasonably [??] believe that your conduct causes harm or liability [??] to a user, third party, or Google — for example, by [...] scraping content that doesn’t belong to you. I honestly don't know what it exactly means to our case here.. We are doing no harm :PSpruce
@mopsyd while you are not compelled to "agree with" (whatever that means) the ToS, you are compelled to comply insofar as Google as a private entity can choose not to provide service to you, and obviously they are likely to do so if you are violating their ToS. Further, they will be able to recoup damages in a civil setting. "Opting out" doesn't make sense; no one is forcing you to use their services. And declaring that they can "suck it" definitely doesn't do anything for you. 😂Diptych
@jungle_mole Google is not using your services so your hypothetical terms to them don't matter. So they are not breaking your terms. And even if they somehow were, you still wouldn't be justified in breaking theirs; that's not how contracts work. It doesn't really matter anyway because you are using their services in this case and you definitely have no particular right guaranteeing you access since as a private entity they have no obligation to serve you in the first place.Diptych
@Ezikiel being able and doing are entirely different concepts. If you want to take the pedantic stance you can say someone has a rule somewhere about the thing. You can also tak a practical stance that weighs the risk of a company retaliating or cutting off service, the likelihood that they care enough about a trivial infraction to waste time and money on an agressive civil action (they don’t, unless your abuse is egrarious), and decide whether or not tangental concerns likea ToS matters to your use case. I am certain that to one prone to pedantry and condecending emojis it probably does.Gifferd
@EzekielVictor as a fact, they are using my services as "targetable ad-watcher" or human for of clicker bot. We have barter: from my side time and cognitive function, from their side -- search window on my desktop. But it's them, who setting rules.. Nope, I have my guesses too. Since they are closed for discussion, I'll just do my way and if they don't agree, they are free to refuse to continue acquiring my services, as you said: no obligations. Anyway, it's valid answer. If it was LawOverflow here, answer could be considered arguable.(sry i would be not justified from whose point of view)Erythro
@jungle_mole, when you refer to "free to refuse to continue acquiring" your services, you're referring to being IP banned. This thread has jumped the shark.Diptych
@EzekielVictor yep, that's what i'm saying: any sanctions they see fit and able to impose. all the more so, they are not going astray from this "warpath" since forever, why succumb? when counterparty feels ok with their moral rights to take their advantage and utilize me, my hands are untied to make use of our reciprocity in any manner i see fit. they don't disclose their ways, neither they negotiate them, so why should i? every party seeks maximum benefit, but one with all its might tries to confine the other. preemptively, mind you. what's left is try and exploit the usurper, it won't starveErythro
E
503

Yes, Google Custom Search has now replaced the old Search API, but you can still use Google Custom Search to search the entire web, although the steps are not obvious from the Custom Search setup.

To create a Google Custom Search engine that searches the entire web:

  1. From the Google Custom Search homepage ( http://www.google.com/cse/ ), click Create a Custom Search Engine.
  2. Type a name and description for your search engine.
  3. Under Define your search engine, in the Sites to Search box, enter at least one valid URL (For now, just put www.anyurl.com to get past this screen. More on this later ).
  4. Select the CSE edition you want and accept the Terms of Service, then click Next. Select the layout option you want, and then click Next.
  5. Click any of the links under the Next steps section to navigate to your Control panel.
  6. In the left-hand menu, under Control Panel, click Basics.
  7. In the Search Preferences section, select Search the entire web but emphasize included sites.
  8. Click Save Changes.
  9. In the left-hand menu, under Control Panel, click Sites.
  10. Delete the site you entered during the initial setup process.

Now your custom search engine will search the entire web.

Pricing

  • Google Custom Search gives you 100 queries per day for free.
  • After that you pay $5 per 1000 queries.
  • There is a maximum of 10,000 queries per day.

Source: https://developers.google.com/custom-search/json-api/v1/overview#Pricing


  • The search quality is much lower than normal Google search (no synonyms, "intelligence" etc.)
  • It seems that Google is even planning to shut down this service completely.
Exine answered 26/6, 2012 at 11:23 Comment(27)
Thanks for this. Hopefully this is a valid procedure and not a loophole waiting to be plugged by Google!Konopka
Confirmed to be working. Results are slightly different than a live search though. Any ideas on that? Bing's API has the same problem.Whipperin
Thank you! This is possibly the only answer on the Internet that addressed my question. It's mind boggling why Google would end direct API support for their core service.Tabshey
but how to use it with json ?Antheridium
The results are a little different because of personalized and local search results.Bayern
welll thats great, but the thing I hesistate at is, IS IT PAID ??Backhouse
@Deepanshu You only get 100 queries per day for free (docs).Bott
This is why Google claims that the search results are different support.google.com/customsearch/answer/141877?hl=en Mainly: Using specified sites (does not apply here), no social or personalized or real time resultsCallis
Any chance you can update this question to reflect the new layout? Can't seem to find half the stuff in your question.Blackleg
Rippo -- I haven't been back in a while... but even if they've changed the layout the methodology is probably still sound: Create a search engine to search a specific site PLUS the entire web. Then delete that specific site. What you're left with should be a generic web search. They may have closed the loophole afterall... but if it's still do-able, this general advice may help. Good luck.Exine
And.. if they have closed the loophole and now force you to search at least 'one' site. You might try creating a URL/site with zero content. Just a blank index.html page. The results should then be the same as a generic web search. 'Just a thought...Exine
I tried it but it doesn't work now. I asked to look in the entire web for suunto ambit watch, but I got no results (I searched in the public URL that I got)Sympathin
Note this only works for the free version support.google.com/customsearch/answer/2631040Ex
@Callis It does not only miss social/live/etc data. It does not allow a search based on synonyms and it is completely missing intelligence. e.g. "john doe northpole" will not return a result if "john doe" is now living at the "southpole" and has changed this information on his website or removed the word "northpole" or he or you made a typo like "nortpole". In my eyes the custom search is nearly useless.Stairhead
WARNING: we did development using the free version, but to upgrade to the paid version (to do more than 100 searches), google forces you to turn off the "search the entire web but emphasize included sites"Adaxial
@BryanLarsen, It's still possible to use the old API that doesn't have the paltry 100/day limit right?Repugnance
@Bangkokian, Why is there a hard limit of 10k queries/day? Assuming you can pay, How do you get above 10k queries/day then? Do you create multiple keys?Repugnance
I'm not sure how it was before, but now you have to set up a billing account regardless of whether you use the free or paid tier. Bummer.Racial
entireweb.com has discontinued the service as seen here entireweb.com/servicesEstrange
This still works.Kamerad
"On April 1, 2017, Google will discontinue sales of the Google Site Search. All new purchases and renewals must take place before this date. The product will be completely shut down by April 1, 2018."Saccharate
Google custom search for the entire web works, but it won't give you more than 100 results per search query even if you are a paying customer.Klug
The Google Custom Search homepage ( google.com/cse ) always returns 500 err... Is anyone facing the same problem?Lazarolazaruk
It's worth adding that besides such a low limit it also permits only 10 results per queryRamunni
After we create the custom search engine, how do we invoke the API ?Compositor
@TinaLee the correct URL is cse.google.com/cseNauplius
There exists a third-party API called SerpApi. It has a Google Search Engine API which returns a raw JSON. It has a free plan of 100 searches/month (to test out). There're plans of $50/$130/$250/enterprise for 5.000/15.000/30.000/100.000+ searches per month accordingly, with a throughput of 20% of plan searches per hour. It has been regularly updated (for new Google layouts) for the last 5 years, and has 10 API wrappers. Playground to mess aroundKingpin
D
59

Google Custom Search (as advocated in the top rated answers) works well, but is very expensive, compared to its competitors (below) or compared to other Google API's. It has a small free tier (100 queries/day) and a very high price of $5 per 1000 query.

They offer the option to upgrade to Site Search, which has slightly better prices, but that is meant for searching one site (your own), so it is really something quite different - not an upgrade.

The main alternatives seem to be:

Bing Search API
https://datamarket.azure.com/dataset/5BA839F1-12CE-4CCE-BF57-A49D98D29A44
Which has a free tier of 5000q/month, and prices starting at 5 query per penny, and no hard limit.

UPDATE: At the end of 2016 this API was shutdown in favour of its Azure counterpart "Cognitive Services Bing Search API":
https://azure.microsoft.com/en-us/services/cognitive-services/search/

See here for a pricing chart, which starts at US$3/m for 1,000 transactions. Unless I'm missing something it is quite expensive.

Yahoo BOSS Search API
UPDATE: Was discontinued on March 31, 2016. http://developer.yahoo.com/boss/search/
With prices starting at about 12 queries/penny for whole web searches.

And some I haven't heard of before:

http://www.gigablast.com/searchfeed.html

http://www.faroo.com/hp/api/api.html

http://www.commoncrawl.org/

http://www.entireweb.com/search_api/implementation/
[discontinued - as pointed out below]

There is a bit of discussion of some of these on this SO post.
[got closed for being off-topic and is now gone]

Decussate answered 19/3, 2014 at 1:20 Comment(7)
Bing Search API version 5 now allows up to 1,000 transactions per month across all Bing Search APIs (Web, Images, Video, News Search) - microsoft.com/cognitive-services/en-us/pricing . I put together some samples - mvark.blogspot.in/2016/06/…Foredo
entireweb.com has discontinued the service as seen here entireweb.com/servicesEstrange
on Dec 15, 2016 Bing Web Search API will move under Cognitive Services by Azure Marketplace (azure.microsoft.com/en-us/services/cognitive-services/search), which require a phone + credit card verification for a subscription (even a free one).Jana
From Bing API: "DataMarket and Data Services are being retired and will stop accepting new orders after 12/31/2016. Existing subscriptions will be retired and cancelled starting 3/31/2017. Please reach out to your service provider for options if you want to continue service."Gladstone
Thanks for pointing out the change - I've updated answer accordingly.Decussate
Looks like Bing's moved their service again - now it's on the Azure Marketplace learn.microsoft.com/en-us/bing/search-apis/bing-web-search/…Spirograph
There's also a third-party API from SerpApi which has Google, Bing, Yahoo and 20+ more search engine APIs. It's also a paid API (with a free plan) but maintained/updated on regular basis for the last 5 years. Roadmap. PlaygroundKingpin
A
41

You could just send them through like a browser does, and then parse the html, that is what I have always done, even for things like Youtube.

Aeolian answered 2/11, 2010 at 23:21 Comment(33)
I really need a proper API call as I'm intending making many calls.Phelan
i'm told that googles terms of service forbid spidering...Earvin
From the TOS: "You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers)..."Ectoblast
Shabby, wouldn't on any large scale. Maybe if the program is for personal use...Horatius
@Aeolian Read the tooltip on the "downvote" button; that's why. Also, because the suggestion isn't allowed by google's TOS.Parasang
@Hugo the answer is useful because it does what was asked and I AM STILL getting downvoted for an answer that was accepted, that works, that is useful, and it is the askser's responsibility to decide on google TOS not mine.Aeolian
@Aeolian If the answer is useful or not is subjective. I did not find it useful having the same question as the op, since it's neither a clean solution, or something that the TOS allows.Parasang
@Hugo no it isn't subjective or at least not in such a degree you suggest, it is useful if it answers the question in a viable way, TOS violations are something to be weighted but not something that makes something wholly useless.Aeolian
"Violate the terms of service with a service provider" is never a good advice. Parsing webpages is something that breaks from one day to the next without warning, this is awful advice - that's the reason it was downvotes more that it was upvoted.Parasang
I don't recall telling them to break the TOS, I gave them a valid answer that was accepted as the best and it is their choice to do what they want with that information.Aeolian
Yes it breaks the terms of service but personally I wouldn't worry about that. Google can handle a little bit of scraping, after all they have made a fortune scraping other peoples sites.Darren
Come on people. Don't be so naive. Google cannot force that ToS down your throat. In order to violate a ToS you must first agree with it (in writing, or by clicking a button like 'Yes, I accept the terms'). Think at this: I put a ToS on my web page that every person that visits that page has to give me $10000. Can I enforce this ToS on my visitors? Will the have to may me immediately.Otranto
@Altar they can still block your IP ;) Ever seen a captcha in Google search? Some people have.Leatherneck
@Altar This's simply untrue. If your program is running on dedicated server, it certainly has a static IP. Besides, having dynamic address still means that you have to reconnect manually to obtain a new one.Leatherneck
@Altar Just saying "come on" doesn't magically dispel all barriers. You have to stay within the limits of the law.Junji
@WGH-most router today have an option to retrieve a new IP at midnight.Otranto
@Altar You are right, but finally you could infringe a law like en.wikipedia.org/wiki/Sui_generis_database_right or in Germany like dejure.org/gesetze/StGB/303b.html So it depends on your country and laws and of course it depends on the laws of the country where google is located at. But finally its much easier for google to ban ips. And of course you could reconnect and obtain a new ip as often as you want, but it could be possible that google uses geo databases to block your region much more (e.g. if you search 10x times in 5 minutes) often than others.Stairhead
No, you can't enforce a ToS against random web surfers. However, creating a program to scrape a web page shows clear intent and the skill required to do so would put you in a higher class of "reasonable person". You might not lose a criminal lawsuit but probably would lose a civil lawsuit. IANAL. Ref: Aaron Swartz.Adaxial
-1 @Zimm3r, you said you provided a "valid answer", but I disagree. I don't consider it a valid answer when it requires the use of a web service, while specifically breaking their T.O.S. Your solution cannot be used without violating Google's Terms of Use, therefore is not really a valid answer in my opinion. It's like someone telling you they need money for groceries, and you suggesting they rob the bank. Sure, technically it is an option, but not one that is likely to work.Sievers
The usefulness of the answer does not mean 'always applicable'. The Google Terms of Service could change - they have already after all. If you need a small amount of files, you are not hurting big G.Barque
Scraping the webpage has these disadvantages: (1) Google doesn't like it - you might face IP ban, captchas and other obstacles. (2) The HTML code of the webpage changes frequently - you will end up fixing your code again and again in your long-term projects. (3) The API can possibly give you more metadata about the search results than the webpage. I downvoted this answer. But I'm not any kind of law nazi. This approach is simply not good for the reasons above.Faeroese
@ændrük that part about automated means is gone from their TOS since March 2012.Assimilative
@Assimilative it still break the terms: "don’t interfere with our Services or try to access them using a method other than the interface and the instructions that we provide."Spruce
BTW: The reason why Google is so adamant about preventing scraping is not for the reasons you think: It is not because it might cost bandwidth—which is cheap. It is because one of Google's most valuable assets, is that its query log is one of the most potent insights into the collective consciousness. Being polluted by mechanized queries would make it worthless, so they are investing all their efforts to dithering scraping done in a way to pollute that data set.Gambrel
@AndreFigueiredo don't [...] access [our Services] using a method other than the interface and the instructions that we provide => a web crawler is using the interface and the instructions that they provide. It just does so by automated means instead of manually, so a web crawler is absolutely compliant with these ToS (at least, with this sentence you quoted).Treasury
@Treasury that's a fair point, I'm not savvy about laws and web crawlers, but my guess is that bots and raw HTTP requests would not be compliant to an accepted interface, versus Selenium for example :P. And, instructions they provide to access their services would not include automated requests - scrapping. Correct me if I'm wrong.Spruce
That said, they have changed their entire TOS, new says: we reasonably [??] believe that your conduct causes harm or liability [??] to a user, third party, or Google — for example, by [...] scraping content that doesn’t belong to you. I honestly don't know what it exactly means to our case here.. We are doing no harm :PSpruce
@mopsyd while you are not compelled to "agree with" (whatever that means) the ToS, you are compelled to comply insofar as Google as a private entity can choose not to provide service to you, and obviously they are likely to do so if you are violating their ToS. Further, they will be able to recoup damages in a civil setting. "Opting out" doesn't make sense; no one is forcing you to use their services. And declaring that they can "suck it" definitely doesn't do anything for you. 😂Diptych
@jungle_mole Google is not using your services so your hypothetical terms to them don't matter. So they are not breaking your terms. And even if they somehow were, you still wouldn't be justified in breaking theirs; that's not how contracts work. It doesn't really matter anyway because you are using their services in this case and you definitely have no particular right guaranteeing you access since as a private entity they have no obligation to serve you in the first place.Diptych
@Ezikiel being able and doing are entirely different concepts. If you want to take the pedantic stance you can say someone has a rule somewhere about the thing. You can also tak a practical stance that weighs the risk of a company retaliating or cutting off service, the likelihood that they care enough about a trivial infraction to waste time and money on an agressive civil action (they don’t, unless your abuse is egrarious), and decide whether or not tangental concerns likea ToS matters to your use case. I am certain that to one prone to pedantry and condecending emojis it probably does.Gifferd
@EzekielVictor as a fact, they are using my services as "targetable ad-watcher" or human for of clicker bot. We have barter: from my side time and cognitive function, from their side -- search window on my desktop. But it's them, who setting rules.. Nope, I have my guesses too. Since they are closed for discussion, I'll just do my way and if they don't agree, they are free to refuse to continue acquiring my services, as you said: no obligations. Anyway, it's valid answer. If it was LawOverflow here, answer could be considered arguable.(sry i would be not justified from whose point of view)Erythro
@jungle_mole, when you refer to "free to refuse to continue acquiring" your services, you're referring to being IP banned. This thread has jumped the shark.Diptych
@EzekielVictor yep, that's what i'm saying: any sanctions they see fit and able to impose. all the more so, they are not going astray from this "warpath" since forever, why succumb? when counterparty feels ok with their moral rights to take their advantage and utilize me, my hands are untied to make use of our reciprocity in any manner i see fit. they don't disclose their ways, neither they negotiate them, so why should i? every party seeks maximum benefit, but one with all its might tries to confine the other. preemptively, mind you. what's left is try and exploit the usurper, it won't starveErythro
D
26

Here is an option at the bottom of the Custom Search Control Panel: "Sites to search", you can choose "Search the entire web but emphasize included sites"

Custom Search Control Panel - Sites to search

Doralia answered 4/8, 2013 at 14:13 Comment(5)
does it still work for you?Sympathin
Yep, it still works.Ex
Google forces you to turn that option off when you upgrade to paid search. And free has a limit of 100 searches.Adaxial
@Yishu, Why does the page https://support.google.com/customsearch/answer/141877?hl=en states "You cannot configure Google Site Search to search the entire web"?Repugnance
@Pacerier, I have no idea about it. Maybe the policy have changed?Doralia
O
14

Faroo has a free Web Search API

Ornithomancy answered 18/11, 2012 at 13:24 Comment(7)
Their results seam limited but a good starting point.Mayhap
@Jack, Not heard of this before. Where do they get their search results from?Repugnance
Possible deal breaker for Faroo is that your API key is restricted to the IP address you specify during registration.Rasorial
Are these guys still operational? I've requested API keys and heard nothing.Steric
Page has a "Coming Soon" banner now...Disequilibrium
Now redirects to seekstorm.com which is a paid for serviceByblow
There's a SerpApi that offers real-time raw JSON results from 26+ search engines including Google. Has 10 API wrappers. Note that it's not a crawler.Kingpin
P
7

I have just come across this from Common Crawl.

http://www.commoncrawl.org/

Might be the answer we are all looking for!!

Phelan answered 2/2, 2012 at 16:39 Comment(5)
It has a limited index, refreshed about once a year. And it is finally quite expensive, as you have to plug into Amazon S3.Trigonous
@GuillaumeLebourgeois, Expensive? I don't think that's true. It's a nonprofit. The entire 102 TB of data is free for download.Repugnance
The cost is for connecting to AWS where you can access this. If you are a student, you are eligible for their free tier, but there could still be transfer costs etc; and if you are not in the free tier, there are running costs.Moderation
Looks like common crawl is updated monthly nowOrnithomancy
At least currently (february 2022) the data can be downloaded from S3 for free. HTTP-links can be found on the commoncrawl website.Pondweed
L
4

There's a note on top of the docs:

Note: The Google Web Search API has been officially deprecated as of November 1, 2010. It will continue to work as per our deprecation policy, but the number of requests you may make per day will be limited. Therefore, we encourage you to move to the new Custom Search API.

The deprecation policy says that they will continue to run the API for 3 years. So if you already have an application that uses the old API, you don't have to rush to change things just yet. If you're writing a new application, use the Custom Search API. See my answer here for how to do this in Python, but the idea's the same for any language.

Lamrouex answered 2/1, 2011 at 21:21 Comment(3)
And it's not free.... "$5 per 1000 queries"... very much not free!Sn
This answer is now obsolete as the three years are up and 2014/09/29 has passed.Kilocycle
CustomeSearchAPI is not in all the websites - it's for the user websitesSympathin
M
3

There's a free Java API called JFreeWebSearch which uses the already mentioned Faroo: http://www.ke.tu-darmstadt.de/resources/jfreewebsearch

Mischievous answered 25/1, 2013 at 8:29 Comment(1)
There's also a google-search-results-java which is a SerpApi wrapper for Java.Kingpin
L
1

Gigablast offers a cheap web search API: http://www.gigablast.com/searchfeed.html

Leptophyllous answered 7/6, 2013 at 3:10 Comment(0)
S
0

You can create "everywhere" custom search engine right from the Google Custom Search homepage ( http://www.google.com/cse/ ). You should just click 'advanced', during adding new engine. There you can provide Schema.org site type. 'Thing' is most generic type, which covers all the web.

Stempien answered 5/8, 2014 at 8:54 Comment(1)
I didn't get it. Does it work for you?Sympathin

© 2022 - 2024 — McMap. All rights reserved.