I'm trying to implement some throttles on our REST API. A typical approach is after a certain threshold to block the request (with 403
or 429
response). However, I've seen one api that adds a delay to the response instead.
As you make calls to the API, we will be looking at your average calls per second (c/s) over the previous five-minute period. Here's what will happen:
over 3c/s and we add a 2 second delay
over 5c/s and we add a 4 second delay
over 7c/s and we add a 5 second delay
From the client's perspective, I see this being better than getting back an error. The worst that can happen is that you'll slow down.
I am wondering how this can be achieved without negatively impacting the app server. i.e. To add those delays, the server needs to keep the request open, causing it to keep more and more request processors busy, meaning it has less capacity for new requests coming in.
What's the best way to accomplish this? (i.e. is this something that can be done on the web server / load balancer so that the application server is not negatively affected? Is there some kind of a throttling layer that can be added for this purpose?)
We're using Django/Tastypie, but the question is more on the architecture/conceptual level.