How can gunicorn handle hundreds of thousands of requests per second for django?

Asked 9/11, 2021 at 11:16 Answered 16/7, 2024 at 20:13

DO NOT scale the number of workers to the number of clients you expect to have. Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second.

Generally we recommend (2 x $num_cores) + 1 as the number of workers to start off with.

From threads,

The number of worker threads for handling requests.

Run each worker with the specified number of threads.

A positive integer generally in the 2-4 x $(NUM_CORES) range. You’ll want to vary this a bit to find the best for your particular application’s workload.

Now the question is what no of threads and workers can serve hundreds or thousands of requests per second?

Let's say I have a dual-core machine and I set 5 workers and 8 threads. And I can serve 40 concurrent requests?

If I am going to serve hundreds or thousands of requests, I'll need a hundred cores?

this line is very hard to understand:

Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second.

Dyspepsia answered 9/11, 2021 at 11:16 Comment(0)

Now the question is what no of threads and workers can serve hundreds or thousands of requests per second? Let's say I have a dual-core machine and I set 5 workers and 8 threads. And I can serve 40 concurrent requests?

Yes, with 5 worker processes, each with 8 threads, 40 concurrent requests can be served. How quickly they'll be served on a dual-core box is another question.

If I am going to serve hundreds or thousands of requests, I'll need a hundred cores?

Not quite. Requests per second is not the same as "concurrent requests".

If each request takes exactly 1 millisecond to handle, then a single worker can serve 1000 RPS. If each request takes 10 milliseconds, a single worker dishes out 100 RPS.

If some requests take 10 milliseconds, others take, say, up to 5 seconds, then you'll need more than one concurrent worker, so the one request that takes 5 seconds does not "hog" all of your serving capability.

Gingili answered 9/11, 2021 at 11:22 Comment(4)

Thanks for your explanation. "If some requests take 10 milliseconds, others take, say, up to 5 seconds, then you'll need more than one concurrent worker" => I'm using gthread worker. If I have 5 workers as the example, I can handle 5 x 5-second requests at the same time? Or 1 only since others are blocked? – Dyspepsia 10/11, 2021 at 3:3

Without looking closer into it, I'd imagine each request will likely take up a thread each. – Gingili 10/11, 2021 at 7:3

Let's say, a single request takes 180 seconds to complete then how many workers should I have? – Gravitation 17/1, 2023 at 12:49

@SyedMuhammadAsad There's no simple answer. If you have one worker, you can service one such request in 3 minutes. If you have two workers and two such requests come in simultaneously, both workers will be tied up for 3 minutes. Long requests would be better handled by a background worker. – Gingili 17/1, 2023 at 14:8

I know that this question is a bit old but, reading through the only answer to date and the comments, it doesn't seem to cover all aspects.

The question, the answer and the comments seem to only consider a synchronous app. In this case, yes, there's a hard limit on the maximum number of concurrent requests you can handle, since every request will be blocking. This case has been covered in sufficient detail so I won't focus on it.

However, if you're writing an asynchronous app, then things can be quite different and the governing factor is whether your application is CPU-bound or IO-bound.

If CPU-bound, then async behaviour is not much different from the sync operation. Since each thread will be effectively blocked by CPU, the app won't be able to take any more concurrent requests.

If IO-bound, then each worker will take on additional requests while the previous ones are waiting for IO operations to complete. So, in this way, you get much greater concurrency in processing requests with the same number of workers and threads than you can with a sync app. This is how you can efficiently achieve hundreds or thousands of requests per second.

Note however that even in an async app, there is a limit on maximum concurrency that comes from Python, which I think is 40 if I remember correctly.

Zanazander answered 16/7, 2024 at 20:13 Comment(0)

Recommended topics

Hot tags