Is using FastAPI with sync threads a good practice for a high scale application?

I'm using FastAPI with non-async endpoints running over gunicorn with multiple workers, from the uvicorn.workers.UvicornWorker class as suggested here. Latley, I noticed high latency in some of our endpoints during times in the day our application is busier than usual. I started to investigate it and I figured out that concurrency in our app doesn't work as we expect.

Let's say I have this FastAPI application (main.py) with the following endpoint

app = FastAPI()
logger = logging.getLogger()

@app.get("/")
def root():
    logger.info(f"Running on {os.getpid()}")
    time.sleep(3600)
    return {"message": "Hello World"}

and I run gunicorn with the following cmd:

gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

When I send to the server five requests, they all get to the same worker except the last one, instead of running in parallel over all workers:

INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 642

If I turn the endpoint into async, every request will be handled on a different worker (the last one will be holded). I know that when using non-async endpoint, FastAPI uses AnyIO threads to handle the requests, the default value for maximum threads is 40. When I try to lower this limit to 2 threads for example using the suggestion here, only the first two requests are being handled while the rest are waiting (even though I still have 4 workers!)

That's bad because both not using all our resources and suffering from python threading problems due to GIL on the same worker.

Is there a way to overcome those problems without turning to async endpoints?

Recommended topics

Hot tags