Background
Echo Nest have a rate limited API. A given application (identified in requests using an API key) can make up to 120 REST calls a minute. The service response includes an estimate of the total number of calls made in the last minute; repeated abuse of the API (exceeding the limit) may cause the API key to be revoked.
When used from a single machine (a web server providing a service to clients) it is easy to control access - the server has full knowledge of the history of requests and can regulate itself correctly.
But I am working on a program where distributed, independent clients make requests in parallel.
In such a case it is much less clear what an optimal solution would be. And in general the problem appears to be undecidable - if over 120 clients, all with no previous history, make an initial request at the same time, then the rate will be exceeded.
But since this is a personal project, and client use is expected to be sporadic (bursty), and my projects have never been hugely successful, that is not expected to be a huge problem. A more likely problem is that there are times when a smaller number of clients want to make many requests as quickly as possible (for example, a client may need, exceptionally, to make several thousand requests when starting for the first time - it is possible two clients would start at around the same time, so they must cooperate to share the available bandwidth).
Given all the above, what are suitable algorithms for the clients so that they rate-limit appropriately? Note that limited cooperation is possible because the API returns the total number of requests in the last minute for all clients.
Current Solution
My current solution (when the question was written - a better approach is given as an answer) is quite simple. Each client has a record of the time the last call was made and the number of calls made in the last minute, as reported by the API, on that call.
If the number of calls is less than 60 (half the limit) the client does not throttle. This allows for fast bursts of small numbers of requests.
Otherwise (ie when there are more previous requests) the client calculates the limiting rate it would need to work at (ie period = 60 / (120 - number of previous requests)
) and then waits until the gap between the previous call and the current time exceeds that period (in seconds; 60 seconds in a minute; 120 max requests per minute). This effectively throttles the rate so that, if it were acting alone, it would not exceed the limit.
But the above has problems. If you think it through carefully you'll see that for large numbers of requests a single client oscillates and does not reach maximum throughput (this is partly because of the "initial burst" which will suddenly "fall outside the window" and partly because the algorithm does not make full use of its history). And multiple clients will cooperate to an extent, but I doubt that it is optimal.
Better Solutions
I can imagine a better solution that uses the full local history of the client and models other clients with, say, a Hidden Markov Model. So each client would use the API report to model the other (unknown) clients and adjust its rate accordingly.
I can also imagine an algorithm for a single client that progressively transitions from unlimited behaviour for small bursts to optimal, limited behaviour for many requests without introducing oscillations.
Do such approaches exist? Can anyone provide an implementation or reference? Can anyone think of better heuristics?
I imagine this is a known problem somewhere. In what field? Queuing theory?
I also guess (see comments earlier) that there is no optimal solution and that there may be some lore / tradition / accepted heuristic that works well in practice. I would love to know what... At the moment I am struggling to identify a similar problem in known network protocols (I imagine Perlman would have some beautiful solution if so).
I am also interested (to a lesser degree, for future reference if the program becomes popular) in a solution that requires a central server to aid collaboration.
Disclaimer
This question is not intended to be criticism of Echo Nest at all; their service and conditions of use are great. But the more I think about how best to use this, the more complex/interesting it becomes...
Also, each client has a local cache used to avoid repeating calls.
N
requests, and receives permission to makeK <= N
requests over the nextt
seconds. Then if desired the server could prioritise clients according to how many requests they want to make and what for. – Scribe