How many concurrent requests does a single Flask process receive?

Asked 7/6, 2012 at 19:12 Answered 6/6, 2018 at 1:5

220

I'm building an app with Flask, but I don't know much about WSGI and it's HTTP base, Werkzeug. When I start serving a Flask application with gunicorn and 4 worker processes, does this mean that I can handle 4 concurrent requests?

I do mean concurrent requests, and not requests per second or anything else.

Certitude answered 7/6, 2012 at 19:12 Comment(0)

273

When running the development server - which is what you get by running app.run(), you get a single synchronous process, which means at most 1 request is being processed at a time.

By sticking Gunicorn in front of it in its default configuration and simply increasing the number of --workers, what you get is essentially a number of processes (managed by Gunicorn) that each behave like the app.run() development server. 4 workers == 4 concurrent requests. This is because Gunicorn uses its included sync worker type by default.

It is important to note that Gunicorn also includes asynchronous workers, namely eventlet and gevent (and also tornado, but that's best used with the Tornado framework, it seems). By specifying one of these async workers with the --worker-class flag, what you get is Gunicorn managing a number of async processes, each of which managing its own concurrency. These processes don't use threads, but instead coroutines. Basically, within each process, still only 1 thing can be happening at a time (1 thread), but objects can be 'paused' when they are waiting on external processes to finish (think database queries or waiting on network I/O).

This means, if you're using one of Gunicorn's async workers, each worker can handle many more than a single request at a time. Just how many workers is best depends on the nature of your app, its environment, the hardware it runs on, etc. More details can be found on Gunicorn's design page and notes on how gevent works on its intro page.

Distend answered 18/12, 2012 at 8:36 Comment(9)

Gunicorn now supports "real" threads since version 19. See this and this. – Insociable 5/10, 2016 at 23:48

How does one keep track of which resources get shared (and how) and which are completely separate between threads/processes? For example, how would I handle a situation where I want to share a huge datastructure between several processes handled by Gunicorn and used in the Flask handlers? – Spurry 20/6, 2018 at 9:51

What you are asking @Johsm is like asking how to share data between different processes within the operating system. The answer to that can answer your question, you have to use external storage since processes don't share its memory with other processes. Gunicorn is here only to utilize multiprocessing CPU architectures but not handles those issues. – Czech 25/6, 2019 at 9:57

What about Eve? Does this hold for Eve as well? – Secateurs 15/10, 2019 at 6:34

the flask development server uses threads by default since v1.0 (github.com/pallets/flask/pull/2529) – Tsana 24/10, 2019 at 9:19

@Tsana yes it does. I included an answer explaining how its possible – Wasting 10/6, 2020 at 3:28

I am facing same issue. For example, if i set -w 10 my app only handles 10 parallel requests at a time. But in production my app maybe needs to handle 1000 requests at a time. How can i handle 1000 requests parallel? gevent worker connection is not a solution for this because it is not parallel. – Wireworm 16/8, 2020 at 13:0

Is it possible to dynamically increase no of workers in gunicorn while it is still running....through flask api @Ryan Artecona – Theis 23/11, 2020 at 10:20

Would GIL impact the threads of Flask app?? – Corona 6/1, 2021 at 11:35

Currently there is a far simpler solution than the ones already provided. When running your application you just have to pass along the threaded=True parameter to the app.run() call, like:

app.run(host="your.host", port=4321, threaded=True)

Another option as per what we can see in the werkzeug docs, is to use the processes parameter, which receives a number > 1 indicating the maximum number of concurrent processes to handle:

threaded – should the process handle each request in a separate thread?

processes – if greater than 1 then handle each request in a new process up to this maximum number of concurrent processes.

Something like:

app.run(host="your.host", port=4321, processes=3) #up to 3 processes

More info on the run() method here, and the blog post that led me to find the solution and api references.

Note: on the Flask docs on the run() methods it's indicated that using it in a Production Environment is discouraged because (quote): "While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well."

However, they do point to their Deployment Options page for the recommended ways to do this when going for production.

Wasting answered 6/6, 2018 at 1:5 Comment(3)

@Coffee_fan you are right. Even on the latest 1.1.x they discourage that, and instead suggest to check their page on Deployment Options when going for production. Including your valuable observation in the answer :) – Wasting 10/6, 2020 at 3:30

According to my understanding threaded=True should be used in development mode only. – Formosa 16/1 at 9:12

@Formosa yeah, ideally. Even Flask suggests so, and indicates (see last part of my answer) that the run() method, threaded or not, should not be used for production. They link to their Deployment Options, also linked in my answer, and in an above comment, to better practices when deploying – Wasting 17/1 at 19:42

Flask will process one request per thread at the same time. If you have 2 processes with 4 threads each, that's 8 concurrent requests.

Flask doesn't spawn or manage threads or processes. That's the responsability of the WSGI gateway (eg. gunicorn).

Violone answered 8/6, 2012 at 5:31 Comment(0)

No- you can definitely handle more than that.

Its important to remember that deep deep down, assuming you are running a single core machine, the CPU really only runs one instruction* at a time.

Namely, the CPU can only execute a very limited set of instructions, and it can't execute more than one instruction per clock tick (many instructions even take more than 1 tick).

Therefore, most concurrency we talk about in computer science is software concurrency. In other words, there are layers of software implementation that abstract the bottom level CPU from us and make us think we are running code concurrently.

These "things" can be processes, which are units of code that get run concurrently in the sense that each process thinks its running in its own world with its own, non-shared memory.

Another example is threads, which are units of code inside processes that allow concurrency as well.

The reason your 4 worker processes will be able to handle more than 4 requests is that they will fire off threads to handle more and more requests.

The actual request limit depends on HTTP server chosen, I/O, OS, hardware, network connection etc.

Good luck!

*instructions are the very basic commands the CPU can run. examples - add two numbers, jump from one instruction to another

Kelm answered 8/6, 2012 at 2:11 Comment(9)

Is it gunicorn spawning the threads or Flask? I found no evidence supporting either possibility. – Violone 8/6, 2012 at 10:4

Sure, I understand that about the processes, but the answer says more threads are spawned as needed. That's what I'd like to have confirmation of. – Violone 7/7, 2012 at 5:31

This answer doesn't contain any information about Flask or gunicorn. – Goingson 20/9, 2017 at 12:57

@Violone From my understanding spawning of threads is based on the worker type (worker_class) you have chosen for gunicorn to run with. See this for more docs.gunicorn.org/en/stable/settings.html#worker-processes – Outright 8/11, 2017 at 6:53

"deep deep down, assuming you are running a single core machine, the CPU really only runs one instruction* at a time" This is not correct on modern machines. Most modern CPUs are pipelined and superscalar, where even a single core has multiple execution units and an instruction decoder that converts the "machine code" seen from the software side into the actual hardware micro-ops that are dispatched to the individual execution units. – Pooley 10/8, 2019 at 17:23

To clarify, way back in the day, CPUs actually did directly execute the numeric instructions in an executable - the machine code. Every CPU reference had an instruction timing chart showing how many clock cycles each instruction took including any memory references. So you could just add up the timings to know how long any piece of code would take. Modern CPUs are not like that at all. One interesting exception is the BeagleBone which has a modern superscalar ARM processor and two old-fashioned "PRU" processors with fixed instruction timing. – Pooley 10/8, 2019 at 23:21

And to clarify that, when I said "modern" I was using it as a loose shorthand for processors like ARM/Intel/AMD chips - pipelined, superscalar, etc. Of course there are also modern processors that work the old way with fixed timing per instruction, like the BeagleBone PRUs I mentioned and various new microcontrollers. (And now back to Gunicorn!) – Pooley 10/8, 2019 at 23:27

Even on a single cpu or imaginary fixed timing cpu the multi threaded/processes has a lot of benefits, as a request has wait for the disc, database, network and other stuff, another request can do some independent work. – Ermentrude 4/2, 2020 at 22:13

Just also be aware that Python has a "Global Interpreter Lock" -- except for some packages which carefully release that lock, no matter how many CPUs you have, you'll have only one python instruction running at a time. So again, just be aware of how your multithreader is actually working; you may not be getting what you think. wiki.python.org/moin/GlobalInterpreterLock – Hurley 20/9, 2022 at 11:53

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags