Is using FastAPI with sync threads a good practice for a high scale application?
Asked Answered
P

0

9

I'm using FastAPI with non-async endpoints running over gunicorn with multiple workers, from the uvicorn.workers.UvicornWorker class as suggested here. Latley, I noticed high latency in some of our endpoints during times in the day our application is busier than usual. I started to investigate it and I figured out that concurrency in our app doesn't work as we expect.

Let's say I have this FastAPI application (main.py) with the following endpoint

app = FastAPI()
logger = logging.getLogger()

@app.get("/")
def root():
    logger.info(f"Running on {os.getpid()}")
    time.sleep(3600)
    return {"message": "Hello World"}

and I run gunicorn with the following cmd:

gunicorn main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000

When I send to the server five requests, they all get to the same worker except the last one, instead of running in parallel over all workers:

INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 643
INFO:root:Running on 642

If I turn the endpoint into async, every request will be handled on a different worker (the last one will be holded). I know that when using non-async endpoint, FastAPI uses AnyIO threads to handle the requests, the default value for maximum threads is 40. When I try to lower this limit to 2 threads for example using the suggestion here, only the first two requests are being handled while the rest are waiting (even though I still have 4 workers!)

That's bad because both not using all our resources and suffering from python threading problems due to GIL on the same worker.

Is there a way to overcome those problems without turning to async endpoints?

Plaster answered 5/12, 2022 at 8:39 Comment(3)
While this answer may not answer your question, it might give you a different perspective/solution to the problem you are faced with.Argentic
Are you sending your requests concurrently to the server or in series?Objectivism
Can you please elaborate on how you are sending the requests ? I've tried reproducing your issue but when sending a few requests the same 4 workers - 2 threads setup, requests are dispatched among 3 workers and handled in batches of 6 (I'm not sure what the 4th worker does, but it is probably left available so that i can handle incoming requests)Martinez

© 2022 - 2024 — McMap. All rights reserved.