How does gunicorn distribute requests across sync workers?

Asked 13/12, 2020 at 16:45 Answered 25/12, 2022 at 9:36

I am using gunicorn to run a simple HTTP server¹ using e.g. 8 sync workers (processes). For practical reasons I am interested in knowing how gunicorn distributes incoming requests between these workers.

Assume that all requests take the same time to complete.

Is the assignment random? Round-robin? Resource-based?

The command I use to run the server:

gunicorn --workers 8 bind 0.0.0.0:8000 main:app

¹ I'm using FastAPI but I believe this is not relevant for this question.

Aplite answered 13/12, 2020 at 16:45 Comment(0)

Gunicorn does not distribute requests.

Each worker is spawned with the same LISTENERS (e.g. gunicorn.sock.TCPSocket) in Arbiter.spawn_worker(), and calls listener.accept() on its own.

The assignment in the blocking OS calls to the socket's accept() method — i.e. whichever worker is later woken up by the OS kernel and given the client connection — is an OS implementation detail that, empirically, is neither round-robin nor resource-based.

Reference from the docs

From https://docs.gunicorn.org/en/stable/design.html:

Gunicorn is based on the pre-fork worker model. ... The master never knows anything about individual clients. All requests and responses are handled completely by worker processes.

Gunicorn relies on the operating system to provide all of the load balancing when handling requests.

Other reading

Mceachern answered 25/12, 2022 at 9:36 Comment(0)

In my case (also with FastAPI), I found that it starts with round-robin, and then turns to be stupid once all workers are full.

Example:

you send 100 requests at the same time
the first 8 are distributed across the 8 sync workers
the remaining 92 will then be assigned to the first worker which will be free, from the first 8
only once ALL (or many) workers are free again, will the new requests be assigned to those in a more balanced way

I am trying to fix that inefficient behavior for the 92 requests mentioned above. No success thus far.

Hopefully, someone else can add their insights??

Mohun answered 23/12, 2022 at 18:27 Comment(8)

Why is it "inefficient behaviour" to assign to whichever worker is free? – Mceachern 25/12, 2022 at 5:8

It's ok if the 1st request is assigned to the free worker, but then this worker shouldn't be considered free anymore. However, in my case, also the 2nd, ...,91st AND 92nd are all assigned to that worker. – Mohun 26/12, 2022 at 11:51

SyncWorker only accepts a client connection when it is free. Requests are not specifically assigned to a possibly unavailable worker in a pool (see my answer). If a worker handles all those requests, then each of those requests are only coming in after the prior request has been handled. It is likely not an equal distribution, but the worker is certainly free. I'm not sure why you think it shouldn't be "considered free". The "load balancing" reading linked in my answer shows an attempt to achieve an "equal distribution" but there are clear performance penalties due to the significant overhead. – Mceachern 26/12, 2022 at 12:30

Thank you @Mceachern for providing those details. I am running experiments and am seeing the behavior that I describe, i.e. that a worker gets assigned all those requests BEFORE it has handled the first ones. Since you mention that this shouldn't happen (correct?), then maybe something else (perhaps a bug in my code) is going on. Will investigate again. – Mohun 26/12, 2022 at 21:12

I have to ask, are you sure you are using SyncWorker? – Mceachern 27/12, 2022 at 0:14

I am using a FastAPI-specific worker class as follows: gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:80. I will add a minimal, reproducible example, but it may take some time before I get to create and upload it – Mohun 28/12, 2022 at 0:23

Well, that's not SyncWorker, which this question is originally about. UvicornWorker does loop.create_server(), which calls sock.accept() on each event loop tick (by design). There's no "fix" for it at the moment — the feature request Support a concurrency ceiling without returning 503, like gunicorn's worker_connections (encode/uvicorn#865) was rejected in Dec 2020. – Mceachern 28/12, 2022 at 11:20

@Mohun Did you able to solve problem ? I am facing same problem and because of this my p99 latency is high. Also can you tell me which experiment you ran to see 92 request are getting assigned to first worker itself ? – Slipperwort 26/2 at 8:46

Reference from the docs

Other reading

Recommended topics

Hot tags