I am load-testing a gunicorn server(uses Uvicorn workers with fastapi) on AWS EC2 machine that I sshed into and port mapped to(doing ssh -L 8000:localhost:8000
), for all requests at port 8000 on my local machine to be routed to the EC2 machine.
And I am using k6 to generate artificial traffic(load-test) for gunicorn server in EC2 instance from my local machine. With ONLY 500-800 vus, upwards of 46% requests always fail, but the CPU usage of EC2 machine never goes past 30% for any of the 8 cores(from htop
). I am using c5a.2xlarge machine(has 4cores or 8threads).
Here's how I am lauching the gunicorn from terminal(because of the config, gunicorn launches with 4 workers):
$ gunicorn api.main:app --worker-class uvicorn.workers.UvicornWorker --user dockerd --capture-output --keep-alive 0 --port 8000
and the configuration file I am using is from tiangolo's uvicorn-gunicorn-docker
This is a fastapi app, serving a scikit-learn model without any calls to database or anything like that. So, this is a completely cpu-bound app.
I am happy to provide more information as required.
Where and what changes do I make in uvicorn or gunicorn to be able to serve lots of requests with as less failure rate as possible, while using all resources to the maximum(or to the extent needed).
uptime
command), but % of requests fail is still mroe or less the same. is that an indication of disk latency? – Acetum