Google Trend Crawler code 429 error
Asked Answered
J

1

6

I am new to python and using unofficial pytrends API to crawl Google Trend. I have 2000+ keywords as DNA list and try to crawl data. When I run this code, it appears with "Google returned a response with code 429" even though I added time.sleep(1). Can anyone help me with this problem?

below is my code

#DNA has 2000+ lists
from pytrends.request import TrendReq
import pandas as pd
import xlsxwriter
import time

pytrends = TrendReq(hl='en-US,tz=360')
Data = pd.DataFrame()

#Google Trend Crawler
for i in range(DNA[i]):
    time.sleep(1)
    kw_list = [DNA[i]]
    pytrends.build_payload(kw_list, cat=0, timeframe='today 5-y', geo='', gprop='')
    df = pd.DataFrame(pytrends.interest_over_time())

    #Setting a Google Trend Dates
    if(i==0):
        Googledate = pd.DataFrame(pytrends.interest_over_time())
        Data['Date'] = Googledate.index
        Data.set_index('Date', inplace=True)

    #results
    if(df.empty == True):
        Data[DNA[i]] = ""  
    else:
        df.index.name = 'Date'
        df.reset_index(inplace=True)
        Data[DNA[i]] = df.loc[:, DNA[i]]
Data
Jackhammer answered 27/11, 2017 at 2:50 Comment(2)
Google generally doesn't like to be crawled by "unofficial APIs".Shearwater
@KlausD. Any ideas to solve this problem then? I cant seem to find an official APIs for Google Trend. ThanksJackhammer
I
6

HTTP/1.1 429 Too Many Requests Content-Type: text/html Retry-After: 3600

Too Many Requests

Too Many Requests

There is no official API for Google Trends. Google has probably placed a limit on the number of requests coming from the same IP.

  1. slow down until you figure out the limit.
  2. run it on several servers allowing you to appear to come from different IP addresses.
  3. stop trying to crawl Google for data they don't want to share.
Imagine answered 27/11, 2017 at 10:19 Comment(6)
Umm. This is for my individual research, so it would be difficult to use different IP addresses. What do you suggest for figuring out the limit? Im very new to python and the code above took me two days to write it.... :DJackhammer
send 1 every minute. do that for a hour see what happens.Imagine
@DalmTo Thank you! That would be time.sleep(60) right?Jackhammer
time.sleep(60) # Delay for 1 minute (60 seconds).Imagine
This is years after this question was posted, but dropping this here incase its useful to anybody. I wouldn't take a simple time.sleep(60) approach. 60s is usually a pretty long sleep time in scraping senses. I would look at an exponential sleep retry solution to try and keep your scrape time down where possibleForgo
google recommends using exponential backoff, most of the client libraries have it built in alreadyImagine

© 2022 - 2025 — McMap. All rights reserved.