Gunicorn Worker Class
Gunicorn has worker_class setting. Some possible values are
- sync
- gthread
- gevent
Definitions from Luis Sena's nice blog
sync This is the default worker class. Each process will handle 1 request at a time and you can use the parameter -w to set workers.
The recommendation for the number of workers is 2–4 x $(NUM_CORES), although it will depend on how your application works.
gthread If you use gthread, Gunicorn will allow each worker to have multiple threads. In this case, the Python application is loaded once per worker, and each of the threads spawned by the same worker shares the same memory space.
Those threads will be at the mercy of the GIL, but it’s still useful for when you have some I/O blocking happening. It will allow you to handle more concurrency without increasing your memory too much.
gevent Eventlet and gevent make use of “green threads” or “pseudo threads” and are based on greenlet.
In practice, if your application work is mainly I/O bound, it will allow it to scale to potentially thousands of concurrent requests on a single process.
Even with the rise of async frameworks (fastapi, sanic, etc), this is still relevant today since it allows you to optimize for I/O without having the extra code complexity.
The way they manage to do it is by “monkey patching” your code, mainly replacing blocking parts with compatible cooperative counterparts from gevent package.
It uses epoll or kqueue or libevent for highly scalable non-blocking I/O. Coroutines ensure that the developer uses a blocking style of programming that is similar to threading, but provide the benefits of non-blocking I/O.
This is usually the most efficient way to run your django/flask/etc web application, since most of the time the bulk of the latency comes from I/O related work.
Workers Value
While using gevent workers threads count is not set up. Documentation settings - threads says that threads are only relevant for gthread workers. So with gevent we only have workers. And each worker is a separate operating system process as far as I know. And the workers count is 8 to 16 for a 4 core machine.
Where does the performance gain come from
So really there is not any threads executed? If there is not any threads how does the gevent workers gain performance. There should be some pseudo threads that should be executed concurrently. While some pseudo thread is making I/O it should be detected and another pseudo thread should be executed. So where is this another pseudo thread? Is it the other worker processes or gunicorn creates some pseudo threads within a worker process?