How to limit rate of requests to web services in Python?

Asked 30/12, 2008 at 19:30 Answered 25/8, 2023 at 6:9

Solved python web-services rate-limiting

I'm working on a Python library that interfaces with a web service API. Like many web services I've encountered, this one requests limiting the rate of requests. I would like to provide an optional parameter, limit, to the class instantiation that, if provided, will hold outgoing requests until the number of seconds specified passes.

I understand that the general scenario is the following: an instance of the class makes a request via a method. When it does, the method emits some signal that sets a lock variable somewhere, and begins a countdown timer for the number of seconds in limit. (In all likelihood, the lock is the countdown timer itself.) If another request is made within this time frame, it must be queued until the countdown timer reaches zero and the lock is disengaged; at this point, the oldest request on the queue is sent, and the countdown timer is reset and the lock is re-engaged.

Is this a case for threading? Is there another approach I'm not seeing?

Should the countdown timer and lock be instance variables, or should they belong to the class, such that all instances of the class hold requests?

Also, is this generally a bad idea to provide rate-limiting functionality within a library? I reason since, by default, the countdown is zero seconds, the library still allows developers to use the library and provide their own rate-limiting schemes. Given any developers using the service will need to rate-limit requests anyway, however, I figure that it would be a convenience for the library to provide a means of rate-limiting.

Regardless of placing a rate-limiting scheme in the library or not, I'll want to write an application using the library, so suggested techniques will come in handy.

Chute answered 30/12, 2008 at 19:30 Comment(0)

This works out better with a queue and a dispatcher.

You split your processing into two sides: source and dispatch. These can be separate threads (or separate processes if that's easier).

The Source side creates and enqueues requests at whatever rate makes them happy.

The Dispatch side does this.

Get the request start time, s.
Dequeues a request, process the request through the remote service.
Get the current time, t. Sleep for rate - (t - s) seconds.

If you want to run the Source side connected directly to the remote service, you can do that, and bypass rate limiting. This is good for internal testing with a mock version of the remote service.

The hard part about this is creating some representation for each request that you can enqueue. Since the Python Queue will handle almost anything, you don't have to do much.

If you're using multi-processing, you'll have to pickle your objects to put them into a pipe.

Zipnick answered 30/12, 2008 at 20:0 Comment(0)

Don't reinvent the wheel, unless it's called for. Check the awesome library ratelimit. Perfect if you just want to rate limit your calls to an rest api for whatever reason and get on with your life.

from datetime import timedelta
from ratelimit import limits, sleep_and_retry
import requests

@sleep_and_retry
@limits(calls=1, period=timedelta(seconds=60).total_seconds())
def get_foobar():
    response = requests.get('https://httpbin.org/get')
    response.raise_for_status()
    return response.json()

This will block the thread if more requests than one per minute is issued.

Bootee answered 21/5, 2018 at 2:49 Comment(1)

Thank you so very much, @vidstige. I have been beating my head against the wall to implement a solution to a 60-per-minute rate limit. – Horseshoes 28/9, 2019 at 22:21

This works out better with a queue and a dispatcher.

You split your processing into two sides: source and dispatch. These can be separate threads (or separate processes if that's easier).

The Source side creates and enqueues requests at whatever rate makes them happy.

The Dispatch side does this.

Get the request start time, s.
Dequeues a request, process the request through the remote service.
Get the current time, t. Sleep for rate - (t - s) seconds.

The hard part about this is creating some representation for each request that you can enqueue. Since the Python Queue will handle almost anything, you don't have to do much.

If you're using multi-processing, you'll have to pickle your objects to put them into a pipe.

Zipnick answered 30/12, 2008 at 20:0 Comment(0)

Queuing may be overly complicated. A simpler solution is to give your class a variable for the time the service was last called. Whenever the service is called (!1), set waitTime to delay - Now + lastcalltime. delay should be equal to the minimum allowable time between requests. If this number is positive, sleep for that long before making the call (!2). The disadvantage/advantage of this approach is that it treats the web service requests as being synchronous. The advantage is that it is absurdly simple and easy to implement.

(!1): Should happen right after receiving a response from the service, inside the wrapper (probably at the bottom of the wrapper).
(!2): Should happen when the python wrapper around the web service is called, at the top of the wrapper.

S.Lott's solution is more elegant, of course.

Scyphus answered 30/12, 2008 at 20:23 Comment(0)

In the doc of the yfinance package they show a nice and concise way to do rate limiting and response caching at the same time. As during development and debugging I often end up doing the same requests over and over again.

from requests import Session
from requests_cache import CacheMixin, SQLiteCache
from requests_ratelimiter import LimiterMixin, MemoryQueueBucket
from pyrate_limiter import Duration, RequestRate, Limiter

class CachedLimiterSession(CacheMixin, LimiterMixin, Session)

session = CachedLimiterSession(
    limiter=Limiter(RequestRate(2, Duration.SECOND*5)),  # max 2 requests per 5 seconds
    bucket_class=MemoryQueueBucket,
    backend=SQLiteCache("yfinance.cache"),
)
response = requests.get('https://httpbin.org/get')
response.raise_for_status()
response.json()

Instep answered 25/8, 2023 at 6:9 Comment(0)

Your rate limiting scheme should be heavily influenced by the calling conventions of the underlying code (syncronous or async), as well as what scope (thread, process, machine, cluster?) this rate-limiting will operate at.

I would suggest keeping all the variables within the instance, so you can easily implement multiple periods/rates of control.

Lastly, it sounds like you want to be a middleware component. Don't try to be an application and introduce threads on your own. Just block/sleep if you are synchronous and use the async dispatching framework if you are being called by one of them.

Cobby answered 30/12, 2008 at 20:5 Comment(0)

If your library is designed to be synchronous, then I'd recommend leaving out the limit enforcement (although you could track rates and at least help the caller decide how to honor limits).

I use twisted to interface with pretty much everything nowadays. It makes it easy to do that type of thing by having a model that separates request submission from response handling. If you don't want your API users to have to use twisted, you'd at least be better off understanding their API for deferred execution.

For example, I have a twitter interface that pushes a rather absurd number of requests through on behalf of xmpp users. I don't rate limit, but I did have to do a bit of work to prevent all of the requests from happening at the same time.

Alta answered 30/12, 2008 at 20:43 Comment(0)

Add a 2 second pause between requests using time.sleep() like this:

import time
import requests

for i in range(10):
    requests.get('http://example.com')
    time.sleep(2)

Motorman answered 30/12, 2008 at 23:25 Comment(3)

Bad assumption. It waits 2 seconds. That will be 2 sec. between end of one and start of another. Usually you want 2 sec. between start of one and start of another. – Zipnick 31/12, 2008 at 3:31

Well, waiting 2s between end of one and start of another may be safer if the limits are done based on actual time between calls The real issue is that your solution waits more than 2s between requests, since computation between requests may take time. – Scyphus 31/12, 2008 at 19:17

Terribly naive solution, how does it work if you want to use multiple threads or an async loop ? – Swartz 21/6 at 11:3

Recommended topics

Hot tags