I'm using FastAPI with non-async
endpoints running over gunicorn with multiple workers, from the uvicorn.workers.UvicornWorker
class as suggested here. Latley, I noticed high latency in some of our endpoints during times in the day our application is busier than usual. I started to investigate it and I figured out that concurrency in our app doesn't work as we expect.
Let's say I have this FastAPI application (main.py) with the following endpoint
app = FastAPI()
logger = logging.getLogger()
@app.get("/")
def root():
logger.info(f"Running on {os.getpid()}")
time.sleep(3600)
return {"message": "Hello World"}
and I run gunicorn
with the following cmd:
gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000
When I send to the server five requests, they all get to the same worker except the last one, instead of running in parallel over all workers:
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 642
If I turn the endpoint into async
, every request will be handled on a different worker (the last one will be holded).
I know that when using non-async endpoint, FastAPI uses AnyIO threads to handle the requests, the default value for maximum threads is 40. When I try to lower this limit to 2 threads for example using the suggestion here, only the first two requests are being handled while the rest are waiting (even though I still have 4 workers!)
That's bad because both not using all our resources and suffering from python threading problems due to GIL on the same worker.
Is there a way to overcome those problems without turning to async
endpoints?