Error 429 when invoking Reddit api from Google App Engine
Asked Answered
I

2

8

I have been running a cron job on Google App Engine for over a month now without any issues. The job does a variety of things, one being that it uses urllib2 to make a call to retrieve a json response from Reddit as well as a few other sites. About two weeks ago I started seeing errors when invoking Reddit, but no errors when invoking the other sites. The error I am receiving is HTTP error 429.

I have tried executing the same code outside of Google App Engine and do not have any issues. I tried using urlFetch, but receive the same error.

You can see the error when using the app engine's interactive shell with the following code.

import urllib2
data = urllib2.urlopen('http://www.reddit.com/r/Music/.json', timeout=60)

Edit: Not sure why it always fails for me and not someone else. This is the error that I receive:

>>> import urllib2
>>> data = urllib2.urlopen('http://www.reddit.com/r/Music/.json', timeout=60)
Traceback (most recent call last):
  File "/base/data/home/apps/s~shell-27/1.356011914885973647/shell.py", line 267, in get
    exec compiled in statement_module.__dict__
  File "<string>", line 1, in <module>
  File "/base/python27_runtime/python27_dist/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/base/python27_runtime/python27_dist/lib/python2.7/urllib2.py", line 400, in open
    response = meth(req, response)
  File "/base/python27_runtime/python27_dist/lib/python2.7/urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "/base/python27_runtime/python27_dist/lib/python2.7/urllib2.py", line 438, in error
    return self._call_chain(*args)
  File "/base/python27_runtime/python27_dist/lib/python2.7/urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "/base/python27_runtime/python27_dist/lib/python2.7/urllib2.py", line 521, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 429: Unknown

similar code running outside of app engine with no problem:

print urllib2.urlopen('http://www.reddit.com/r/Music/.json').read()

At first I thought it had to do with a timeout problem since it was originally working, but since there is not a timeout error but a the strange HttpError code, I'm not sure. Any ideas?

Infusive answered 22/1, 2012 at 18:29 Comment(6)
Using the interactive shell and the code you've provided works for me.Incommensurable
It just means you are making too many requests. There is nothing you can do as a programmer. To avoid these errors, you can generally put a sleep between requests. tools.ietf.org/html/draft-nottingham-http-new-status-02#page-4Previous
@user947240: follow shadyabhi's advice.Magdala
That might help with multiple calls, but it fails on the first call to Reddit. The Reddit api site only states that you can't make more than one call to the same url within 30 secs. Do they consider all google app engine calls from multiple apps to be from the same source and are therefore blocking any call from any app running on google app engine?Infusive
I'm getting this problem as well. It seems to be intermittent -- it will work for a while and then break. When I try again, it usually works again for a while before it breaks again. Did you end up solving this?Jointless
Sort of, I added retry logic and a 31 sec sleep. It appears to usually work after one or two tries, but not always. I emailed reddit's rate limit email address, but didn't hear back.Infusive
F
14

Reddit rate limits the api pretty severely for the default user agent for the python shell. You need to set a unique user agent with your reddit username in it, like this:

User-Agent: super happy flair bot by /u/spladug

More info about the reddit api here https://github.com/reddit/reddit/wiki/API.

Fetid answered 6/9, 2012 at 12:19 Comment(1)
This should be the accepted answer. Straight out of the documentation in that link "Many default User-Agents (like "Python/urllib" or "Java") are drastically limited to encourage unique and descriptive user-agent strings."Skidway
U
0

It's possible that Reddit is counting calls based on IP - which means that other applications on GAE which share your IP might already be exhausting the quota.

This might get better if you use Reddit API keys (I don't know if they issue them) or if they agree to rate limit API calls based on the app header.

Unnerve answered 6/5, 2012 at 9:25 Comment(2)
Sudhir, this is just speculation.Exhortation
Yup... it was. I don't the think answer is phrased as a certainty.Unnerve

© 2022 - 2024 — McMap. All rights reserved.