Google Search from a Python App

Asked 1/11, 2009 at 16:21 Answered 7/2, 2013 at 5:23

I'm trying to run a google search query from a python app. Is there any python interface out there that would let me do this? If there isn't does anyone know which Google API will enable me to do this. Thanks.

Brittnee answered 1/11, 2009 at 16:21 Comment(0)

There's a simple example here (peculiarly missing some quotes;-). Most of what you'll see on the web is Python interfaces to the old, discontinued SOAP API -- the example I'm pointing to uses the newer and supported AJAX API, that's definitely the one you want!-)

Edit: here's a more complete Python 2.6 example with all the needed quotes &c;-)...:

#!/usr/bin/python
import json
import urllib

def showsome(searchfor):
  query = urllib.urlencode({'q': searchfor})
  url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
  search_response = urllib.urlopen(url)
  search_results = search_response.read()
  results = json.loads(search_results)
  data = results['responseData']
  print 'Total results: %s' % data['cursor']['estimatedResultCount']
  hits = data['results']
  print 'Top %d hits:' % len(hits)
  for h in hits: print ' ', h['url']
  print 'For more results, see %s' % data['cursor']['moreResultsUrl']

showsome('ermanno olmi')

Morganite answered 1/11, 2009 at 16:30 Comment(5)

Tried this on my local Linux machine and then Google thought I am a bot and any searches from my browser are captcha'ed! I shouldn't have tried this at work, just a heads-up for someone using this. Add user agent and referrer to make it look more like a genuine request! – Flanders 28/9, 2011 at 16:8

Unfortunately the Google Web Search API on which this relies was deprecated in November 2010. The Custom Search API is supposed to replace this, but requires you to configure a list of URLs to search across - not the entire web. – Bremser 27/12, 2011 at 17:11

as of today (2014.06.10), this is working ... on my IPython/Python2.7.6 – Littlefield 10/6, 2014 at 9:20

As of March 2016, this doesn't work. Google responds with the following: {"responseData": null, "responseDetails": "The Google Web Search API is no longer available. Please migrate to the Google Custom Search API (developers.google.com/custom-search)", "responseStatus": 403} – Maximilien 16/3, 2017 at 8:27

As mentioned above, this is a deprecated API that no longer works. Also, google uses https for everything, so the http:// url alone deprecates it. Same with John La Rooy's answer below. – Adkinson 25/3, 2017 at 21:18

Here is Alex's answer ported to Python3

#!/usr/bin/python3
import json
import urllib.request, urllib.parse

def showsome(searchfor):
  query = urllib.parse.urlencode({'q': searchfor})
  url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
  search_response = urllib.request.urlopen(url)
  search_results = search_response.read().decode("utf8")
  results = json.loads(search_results)
  data = results['responseData']
  print('Total results: %s' % data['cursor']['estimatedResultCount'])
  hits = data['results']
  print('Top %d hits:' % len(hits))
  for h in hits: print(' ', h['url'])
  print('For more results, see %s' % data['cursor']['moreResultsUrl'])

showsome('ermanno olmi')

Deserved answered 1/11, 2009 at 19:9 Comment(5)

What would be the advantage of using Python 3 over Alex's answer? – Birkenhead 16/11, 2010 at 18:49

@Phill, not sure what you mean by "advantage". If your project uses Python2 you use Alex's answer. If the project uses Python3 you can use this answer. Unfortunately it's not really practical to write this function in such a way to use the same code with both versions of Python – Deserved 16/11, 2010 at 22:41

I guess my question is why use Python3 over Python2? What are the benefits? New to Python, coming from PHP background. Are things more formalised? – Birkenhead 17/11, 2010 at 17:20

@Phill, Python3 is a cleaner more consistent design than Python2, but is not fully backwards compatible. Typically the changes required to port code are quite small, as you can see here, however a number of 3rd party libraries and frameworks still don't support Python3, so many people are still using Python2 – Deserved 18/11, 2010 at 0:37

Is there a way to get more than 4 hits? – Overflow 25/4, 2015 at 21:19

Here's my approach to this: http://breakingcode.wordpress.com/2010/06/29/google-search-python/

A couple code examples:

    # Get the first 20 hits for: "Breaking Code" WordPress blog
    from google import search
    for url in search('"Breaking Code" WordPress blog', stop=20):
        print(url)

    # Get the first 20 hits for "Mariposa botnet" in Google Spain
    from google import search
    for url in search('Mariposa botnet', tld='es', lang='es', stop=20):
        print(url)

Note that this code does NOT use the Google API, and is still working to date (January 2012).

Carmichael answered 10/1, 2012 at 10:57 Comment(2)

HI Mario, I have tried to use your script and its fabulous. I am facing just one issue - even when I use .COM as the TLD I am getting the results which come on .CO.IN. Can you please help. – Devonna 24/2, 2015 at 14:7

Note this can break at any time as it is not using an official API but scraping the Google results page, e.g. if Google changes the way the results are returned. – Clientage 21/4, 2015 at 9:55

I am new in python and I was investigating how to do this. None of the provided examples are working properly for me. Some are blocked by google if you make many (few) requests, some are outdated. Parsing the google search html (adding the header in the request) will work until google changes the html structure again. You can use the same logic to search in any other search engine, looking into the html (view-source).

import urllib2

def getgoogleurl(search,siteurl=False):
    if siteurl==False:
        return 'http://www.google.com/search?q='+urllib2.quote(search)
    else:
        return 'http://www.google.com/search?q=site:'+urllib2.quote(siteurl)+'%20'+urllib2.quote(search)

def getgooglelinks(search,siteurl=False):
   #google returns 403 without user agent
   headers = {'User-agent':'Mozilla/11.0'}
   req = urllib2.Request(getgoogleurl(search,siteurl),None,headers)
   site = urllib2.urlopen(req)
   data = site.read()
   site.close()

   #no beatifulsoup because google html is generated with javascript
   start = data.find('<div id="res">')
   end = data.find('<div id="foot">')
   if data[start:end]=='':
      #error, no links to find
      return False
   else:
      links =[]
      data = data[start:end]
      start = 0
      end = 0        
      while start>-1 and end>-1:
          #get only results of the provided site
          if siteurl==False:
            start = data.find('<a href="/url?q=')
          else:
            start = data.find('<a href="/url?q='+str(siteurl))
          data = data[start+len('<a href="/url?q='):]
          end = data.find('&amp;sa=U&amp;ei=')
          if start>-1 and end>-1: 
              link =  urllib2.unquote(data[0:end])
              data = data[end:len(data)]
              if link.find('http')==0:
                  links.append(link)
      return links

Usage:

links = getgooglelinks('python','http://www.stackoverflow.com/')
for link in links:
       print link

(Edit 1: Adding a parameter to narrow the google search to a specific site)

(Edit 2: When I added this answer I was coding a Python script to search subtitles. I recently uploaded it to Github: Subseek)

Nonesuch answered 7/2, 2013 at 5:23 Comment(4)

I'm interested in why none of the examples worked for you, especially the bit about BeautifulSoup not working because the HTML is generated by JavaScript... I've tried mine just now and it's working: breakingcode.wordpress.com/2010/06/29/google-search-python – Comptometer 6/5, 2013 at 19:11

In my case I wasn't able to use BeautifulSoup. I tested it and it seems that google was generating the html response with javascript blocks, so I didn't find a way to get the links with the BS class. I only found the links in the response using the "find" function. – Nonesuch 4/8, 2013 at 19:50

Maybe the URL to Google is pointing to the newer API that uses JavaScript instead of the legacy API that used bare HTML. I believe adding "&btnG=Google+Search" in your queries causes it to use the HTML API, or at least that's the only difference I see. – Comptometer 6/8, 2013 at 13:58

@Comptometer Thanks for the tip. I will try it using the parameter. Maybe is it faster that way? – Nonesuch 16/8, 2013 at 20:15

Recommended topics

Hot tags