Node.js server with multiple concurrent requests, how does it work?

Here's a description of a sequence of events for your three requests:

Three requests are sent to the node.js web server.
Whichever request arrives fractionally before the other two will trigger the web server request handler and it will start executing.
The other two requests go into the node.js event queue, waiting their turn. It's technically up to the internals of the node.js implementation whether a waiting request is queued at the incoming TCP level or whether it's queued inside of node.js (I don't actually know), but for the purposes of this discussion, all that matters is that the incoming event is queued and won't trigger until the first request stops running.
That first request handler will execute until it hits an asynchronous operation (such as reading a file) and then has nothing else to do until the async operation completes.
At that point, the async file I/O operation is initiated and that original request handler returns (it is done with what it can do at that moment).
Since the first request (which is waiting for file I/O) has returned for now, the node.js engine can now pull the next event out of the event queue and start it. This will be the second request to arrive on the server. It will go through the same process at the first request and will run until it has nothing else to do (and is also waiting for file I/O).
When the second requests returns back to the system (because it's waiting for file I/O), then the third request can start running. It will follow the same path as the previous two.
When the third request is now also waiting for I/O and returns back to the system, node.js is then free to pull the next event out of the event queue.
At this point, all three request handlers are "in flight" at the same time. Only one ever actually runs at once, but all are in process at once.
This next event in the event queue could be some other event or some other request or it could be the completion of one of the three previous file I/O operations. Whichever event is next in the queue will start executing. Suppose it's the first request's file I/O operation. At that point, it calls the completion callback associated with that first request's file I/O operation and that first request starts processing the file I/O results. This code will then continue to run until it either finishes the entire request and returns or until it starts some other async operation (like more file I/O) and returns.
Eventually, the second request's file I/O will be ready and that event will be pulled from the event queue.
Then, the same for the third request and eventually all three will finish.

So, even though only one request ever is actually executing at the same time, multiple requests can be "in process" or "in flight" at the same time. This is sometimes called cooperative multi-tasking in that rather than "pre-emptive" multitasking with multiple, native threads where the system can freely switch between threads at any moment, a given thread of Javascript runs until it returns back to the system and then, and only then, can another piece of Javascript start running. Because a piece of Javascript can initiate non-blocking asynchronous operations, the thread of Javascript can return back to the system (enabling other pieces of Javascript to run) while it's asynchronous operations are still pending. When those operations completes, they will post an event to the event queue and when other Javascript is done and that event gets to the top of the queue, it will run.

Single Threaded

The key point here is that a given thread of Javascript will run until it returns back to the system. If, in the process of executing, it starts some asynchronous operations (such as file I/O or networking), then when those events finish, they will put an event in the event queue and when the JS engine is done running any events before it, that event will be serviced and will cause a callback to get called and that callback will get its turn to execute.

This single threaded nature vastly simplifies how concurrency is handled vs. a multi-threaded model. In a fully multi-threaded environment where every single request starts its own thread, then ANY data that wishes to be shared, even a simple variable is subject to a race condition and must be protected with a mutex before anyone can even just read it.

In Javascript because there is no concurrent execution of multiple requests, no mutex is needed for simple shared variable access. At the point one piece of Javascript is reading a variable, by definition, no other Javascript is running at that moment (single threaded).

Node.js Does Use Threads

One technical distinction of note is that only the execution of your Javascript is single threaded. The node.js internals do use threads themselves for some things. For example, asynchronous file I/O actually uses native threads. Network I/O does not actually use threads (it uses native event driven networking).

But, this use of threads in the internals of node.js does not affect the Javascript execution directly. There is still only ever one single thread of Javascript executing at a time.

Race Conditions

There still can be race conditions for state that is in the middle of being modified when an async operation is initiated, but this is way, way less common than in a multi-threaded environment and it is much easier to identify and protect these cases. As an example of a race condition that can exist, I have a simple server that takes readings from several temperature probes every 10 seconds using an interval timer. It collects the data from all those temperature readings and every hour it writes out that data to disk. It uses async I/O to write the data to disk. But, since a number of different async file I/O operations are used to write the data to disk, it is possible for the interval timer to fire in between some of those async file I/O operations causing the data that the server is in the middle of writing to disk to be modified. This is bad and can cause inconsistent data to be written. In a simple world, this could be avoided by making a copy of all the data before it starts writing it to disk so if a new temperature reading comes in while the data is being written to disk, the copy will not be affected and the code will still write a consistent set of data to disk. But, in the case of this server, the data can be large and the memory on the server is small (it's a Raspberry Pi server) so it is not practical to make an in-memory copy of all the data.

So, the problem is solved by setting a flag when the data is in the process of being written to disk and then clearing the flag when data is done being written to disk. If an interval timer fires while this flag is set, the new data is put into a separate queue and the core data that is in the process of being written to disk is NOT modified. When the data is done being written to disk, it checks the queue and any temperature data it finds there is then added to the in-memory temperature data. The integrity of what is in the process of being written to disk is preserved. My server logs an event any time this "race condition" is hit and data is queued because of it. And, lo and behold, it does happen every once in a while and the code to preserve the integrity of the data works.

Recommended topics

Hot tags