How the single threaded non blocking IO model works in Node.js
Asked Answered
C

9

348

I'm not a Node programmer, but I'm interested in how the single-threaded non-blocking IO model works. After I read the article understanding-the-node-js-event-loop, I'm really confused about it. It gave an example for the model:

c.query(
   'SELECT SLEEP(20);',
   function (err, results, fields) {
     if (err) {
       throw err;
     }
     res.writeHead(200, {'Content-Type': 'text/html'});
     res.end('<html><head><title>Hello</title></head><body><h1>Return from async DB query</h1></body></html>');
     c.end();
    }
);

Que: When there are two requests A(comes first) and B since there is only a single thread, the server-side program will handle the request A firstly: doing SQL querying is asleep statement standing for I/O wait. And The program is stuck at the I/O waiting, and cannot execute the code which renders the web page behind. Will the program switch to request B during the waiting? In my opinion, because of the single thread model, there is no way to switch one request from another. But the title of the example code says that everything runs in parallel except your code.

(P.S I'm not sure if I misunderstand the code or not since I have never used Node.)How Node switch A to B during the waiting? And can you explain the single-threaded non-blocking IO model of Node in a simple way? I would appreciate it if you could help me. :)

Curule answered 10/2, 2013 at 5:54 Comment(0)
A
399

Node.js is built upon libuv, a cross-platform library that abstracts apis/syscalls for asynchronous (non-blocking) input/output provided by the supported OSes (Unix, OS X and Windows at least).

Asynchronous IO

In this programming model open/read/write operation on devices and resources (sockets, filesystem, etc.) managed by the file-system don't block the calling thread (as in the typical synchronous c-like model) and just mark the process (in kernel/OS level data structure) to be notified when new data or events are available. In case of a web-server-like app, the process is then responsible to figure out which request/context the notified event belongs to and proceed processing the request from there. Note that this will necessarily mean you'll be on a different stack frame from the one that originated the request to the OS as the latter had to yield to a process' dispatcher in order for a single threaded process to handle new events.

The problem with the model I described is that it's not familiar and hard to reason about for the programmer as it's non-sequential in nature. "You need to make request in function A and handle the result in a different function where your locals from A are usually not available."

Node's model (Continuation Passing Style and Event Loop)

Node tackles the problem leveraging javascript's language features to make this model a little more synchronous-looking by inducing the programmer to employ a certain programming style. Every function that requests IO has a signature like function (... parameters ..., callback) and needs to be given a callback that will be invoked when the requested operation is completed (keep in mind that most of the time is spent waiting for the OS to signal the completion - time that can be spent doing other work). Javascript's support for closures allows you to use variables you've defined in the outer (calling) function inside the body of the callback - this allows to keep state between different functions that will be invoked by the node runtime independently. See also Continuation Passing Style.

Moreover, after invoking a function spawning an IO operation the calling function will usually return control to node's event loop. This loop will invoke the next callback or function that was scheduled for execution (most likely because the corresponding event was notified by the OS) - this allows the concurrent processing of multiple requests.

You can think of node's event loop as somewhat similar to the kernel's dispatcher: the kernel would schedule for execution a blocked thread once its pending IO is completed while node will schedule a callback when the corresponding event has occured.

Highly concurrent, no parallelism

As a final remark, the phrase "everything runs in parallel except your code" does a decent job of capturing the point that node allows your code to handle requests from hundreds of thousands open socket with a single thread concurrently by multiplexing and sequencing all your js logic in a single stream of execution (even though saying "everything runs in parallel" is probably not correct here - see Concurrency vs Parallelism - What is the difference?). This works pretty well for webapp servers as most of the time is actually spent on waiting for network or disk (database / sockets) and the logic is not really CPU intensive - that is to say: this works well for IO-bound workloads.

Andersen answered 10/2, 2013 at 11:41 Comment(9)
A follow up questions: how does the I/O actually happen then? Node is making a request to the system and asking to be notified when it is finished. So is the system running a thread that is doing the I/O, or is the system also performing the I/O asynchronously at the hardware level using interrupts? Something somewhere has to wait for the I/O to finish, and that is going to block till it's done and consume some amount of resources.Nisen
Just noticed that this followup comment is answered by @user568109 below, I wish there were a way to merge these two answers.Dimond
Node is supported in a lot of places, for the record. When I was designing firmware for MIPS32 routers, Node.JS could be run on those via OpenWRT.Tribalism
How does it score over apache? Apache is also capable of handling concurrent connections with a separate thread.Sanmiguel
@SuhailGupta it scales with fewer resources than Apache does. If you used one thread per 1000 requests (vs Apache which uses one thread per request), you can spin up more node processes running in parallel to handle more traffic. This assumes, however, that you've written a server without state in memory and that you have a load balancing server in front of all your processes.Thrust
@philip, to answer your question, I believe the part that waits for the I/O to finish - is the code included in the callback function. Considering this, if the majority of the code depends on the I/O response and so included in the callback, it defeats the point of using a single asynchronous event loop - as that part of code is actually being executed synchronously.Fatima
@Nisen There is a way no polling will ever be required. Read about hardware interrupts. A device like disk (with file handlers as representatives of actual physical interfaces like Ethernet adapter) can signal the OS by hardware interrupts that it is ready with some data. Wikipedia at en.wikipedia.org/wiki/Asynchronous_I/O says "..Direct memory access (DMA) can greatly increase the efficiency of a polling-based system, and hardware interrupts can eliminate the need for polling entirely..".Wynellwynn
@utaal, When U say "node's event loop" , is it different from "JS Event Loop"?. Cause JS also uses "Event Loop" for things like "setTimeOut" etc.. As described in this video about "JS Event Loop" youtu.be/8aGhZQkoFbQCarpetbagger
@Nisen Regading how I/O is actually done, system calls can be non-blocking like when calling open(2) with O_NONBLOCK flag or more platform specific calls like aio_read(2) on Linux or whatever the non-blocking calls are on Windows. Meaning read(2) or whatever normally blocking calls return immediately and then the main thread calls epoll(7), select(2) or whatever to determine when they can be called again to read / write actual data.Puiia
B
220

Well, to give some perspective, let me compare node.js with apache.

Apache is a multi-threaded HTTP server, for each and every request that the server receives, it creates a separate thread which handles that request.

Node.js on the other hand is event driven, handling all requests asynchronously from single thread.

When A and B are received on apache, two threads are created which handle requests. Each handling the query separately, each waiting for the query results before serving the page. The page is only served until the query is finished. The query fetch is blocking because the server cannot execute the rest of thread until it receives the result.

In node, c.query is handled asynchronously, which means while c.query fetches the results for A, it jumps to handle c.query for B, and when the results arrive for A arrive it sends back the results to callback which sends the response. Node.js knows to execute callback when fetch finishes.

In my opinion, because it's a single thread model, there is no way to switch from one request to another.

Actually the node server does exactly that for you all the time. To make switches, (the asynchronous behavior) most functions that you would use will have callbacks.

Edit

The SQL query is taken from mysql library. It implements callback style as well as event emitter to queue SQL requests. It does not execute them asynchronously, that is done by the internal libuv threads that provide the abstraction of non-blocking I/O. The following steps happen for making a query :

  1. Open a connection to db, connection itself can be made asynchronously.
  2. Once db is connected, query is passed on to the server. Queries can be queued.
  3. The main event loop gets notified of the completion with callback or event.
  4. Main loop executes your callback/eventhandler.

The incoming requests to http server are handled in the similar fashion. The internal thread architecture is something like this:

node.js event loop

The C++ threads are the libuv ones which do the asynchronous I/O (disk or network). The main event loop continues to execute after the dispatching the request to thread pool. It can accept more requests as it does not wait or sleep. SQL queries/HTTP requests/file system reads all happen this way.

Beat answered 10/2, 2013 at 11:9 Comment(11)
@RobertSiemer I have updated my answer. Does this answer the question now ?Beat
File system operations are blocking internally and it is run in a thread pool. Threads notify the main event loop which runs the callback.Beat
Wait, so in your diagram you have the "internal C++ threadpool", which means that all IO blocking operations will spawn a thread, right? So if my Node app does some IO work for every request, is there virtually no difference between the Node model and the Apache model? I'm not getting this part sorry.Erythritol
@Erythritol They don't spawn a thread, the requests are queued. The threads in the threadpool process them. The threads are not dynamic and per request as in Apache. They are usually fixed and differ from system to system.Beat
@Beat But Apache is using a threadpool too (httpd.apache.org/docs/2.4/mod/worker.html). So in the end the difference between a setup with node.js differs from one with Apache in front only in where the threadpool is located, isn't it?Derian
+1 for illustration. So, every new connection to the server also comes to the main event loop first, which then eventually sends it to the stack for execution when its turn comes, right?Fatima
@Beat What happens if there are more requests than the c++ thread pool has threads? Why doesn't node's single threaded event loop block?Tusk
@Derian Yes, Apache is using a thread-pool too. However, ALL of the processing (i.e., the SQL query itself AND what happens next after the result is returned from the database) is done on that same thread. Node.js will execute only the query part on a separate thread (a thread from libuv's thread-pool) and will deliver the result on the event-loop thread instead. Therefore, your code inside the callback will execute on the event-loop thread.Kovar
@Derian Apache has a thread pool but the threads are blocking in nature for all kinds of IO, while libuv's thread pool (which also is blocking) is only for file IO (see docs.libuv.org/en/v1.x/design.html). For all network IO Node uses non blocking sockets.Wynellwynn
Let me ask if I got this right: So the code execution is single-thread, but the program is actually run multithreaded?Spellbind
Finally, someone made sense, I have been struggling with this when people say event loop which I used back in the days of windows development, we had event loop and implemented blocking operations by worker threads. So the different between asynch and non blocking io is who is managing the thread-pool and for what work.Piquet
K
63

Node.js uses libuv behind the scenes. libuv has a thread pool (of size 4 by default). Therefore Node.js does use threads to achieve concurrency.

However, your code runs on a single thread (i.e., all of the callbacks of Node.js functions will be called on the same thread, the so called loop-thread or event-loop). When people say "Node.js runs on a single thread" they are really saying "the callbacks of Node.js run on a single thread".

Kovar answered 28/11, 2016 at 19:56 Comment(1)
good answer I would add that I/O happens outside this main event-loop, loop-thread, request-threadRaddy
P
9

Node.js is based on the event loop programming model. The event loop runs in single thread and repeatedly waits for events and then runs any event handlers subscribed to those events. Events can be for example

  • timer wait is complete
  • next chunk of data is ready to be written to this file
  • theres a fresh new HTTP request coming our way

All of this runs in single thread and no JavaScript code is ever executed in parallel. As long as these event handlers are small and wait for yet more events themselves everything works out nicely. This allows multiple request to be handled concurrently by a single Node.js process.

(There's a little bit magic under the hood as where the events originate. Some of it involve low level worker threads running in parallel.)

In this SQL case, there's a lot of things (events) happening between making the database query and getting its results in the callback. During that time the event loop keeps pumping life into the application and advancing other requests one tiny event at a time. Therefore multiple requests are being served concurrently.

event loop high level view

According to: "Event loop from 10,000ft - core concept behind Node.js".

Polychrome answered 14/9, 2015 at 7:51 Comment(0)
T
6

The function c.query() has two argument

c.query("Fetch Data", "Post-Processing of Data")

The operation "Fetch Data" in this case is a DB-Query, now this may be handled by Node.js by spawning off a worker thread and giving it this task of performing the DB-Query. (Remember Node.js can create thread internally). This enables the function to return instantaneously without any delay

The second argument "Post-Processing of Data" is a callback function, the node framework registers this callback and is called by the event loop.

Thus the statement c.query (paramenter1, parameter2) will return instantaneously, enabling node to cater for another request.

P.S: I have just started to understand node, actually I wanted to write this as comment to @Philip but since didn't have enough reputation points so wrote it as an answer.

Turfy answered 9/6, 2014 at 9:43 Comment(0)
R
3

if you read a bit further - "Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code, so you can’t worry about them other than by knowing that I/O interactions e.g. with the database, or with other processes will be asynchronous from the perspective of each request since the results from those threads are returned via the event loop to your code."

about - "everything runs in parallel except your code" - your code is executed synchronously, whenever you invoke an asynchronous operation such as waiting for IO, the event loop handles everything and invokes the callback. it just not something you have to think about.

in your example: there are two requests A (comes first) and B. you execute request A, your code continue to run synchronously and execute request B. the event loop handles request A, when it finishes it invokes the callback of request A with the result, same goes to request B.

Rozanna answered 10/2, 2013 at 10:25 Comment(2)
"Of course, on the backend, there are threads and processes for DB access and process execution. However, these are not explicitly exposed to your code" - If I take from this phrase, then I don't see any difference between what Node do or any multithreaded framework - let's say Java's Spring Framework - does. There are threads, but you don't control their creation.Bayless
@RafaelEyng I think for handling the series of multiple requests, node will always have a single thread for that. I am not sure if each callbacks are put on new instance of threads aside from other processes like db access but atleast we surely know that node does not instantiate threads every time it receives a request which will have to wait in line before processing (executions before the callback).Efficient
C
1

Okay, most things should be clear so far... the tricky part is the SQL: if it is not in reality running in another thread or process in it’s entirety, the SQL-execution has to be broken down into individual steps (by an SQL processor made for asynchronous execution!), where the non-blocking ones are executed, and the blocking ones (e.g. the sleep) actually can be transferred to the kernel (as an alarm interrupt/event) and put on the event list for the main loop.

That means, e.g. the interpretation of the SQL, etc. is done immediately, but during the wait (stored as an event to come in the future by the kernel in some kqueue, epoll, ... structure; together with the other IO operations) the main loop can do other things and eventually check if something happened of those IOs and waits.

So, to rephrase it again: the program is never (allowed to get) stuck, sleeping calls are never executed. Their duty is done by the kernel (write something, wait for something to come over the network, waiting for time to elapse) or another thread or process. – The Node process checks if at least one of those duties is finished by the kernel in the only blocking call to the OS once in each event-loop-cycle. That point is reached, when everything non-blocking is done.

Clear? :-)

I don’t know Node. But where does the c.query come from?

Checkbook answered 9/11, 2013 at 10:50 Comment(3)
kqueue epoll is for scalable asynchronous I/O notification in linux kernel. Node has libuv for that. Node is entirely on userland. It does not depend on what kernel implements.Beat
@user568109, libuv is Node’s middle man. Any asynchronous framework depends (directly or not) on some asynchronous I/O support in the kernel. So?Checkbook
Sorry for the confusion. Socket operations require non-blocking I/O from kernel. It takes care of the asynchronous handling. But asynchronous file I/O is handled by libuv itself. Your answer does not tell that. It treats both as same, being handled by the kernel.Beat
H
0

The event loop is what allows Node.js to perform non-blocking I/O operations — despite the fact that JavaScript is single-threaded — by offloading operations to the system kernel whenever possible. Think of event loop as the manager.

  • New requests are sent into a queue and watched by the synchronous event demultiplexer. As you see each operations handler is also registered.

enter image description here

  • Then those requests are sent to the thread pool (Worker Pool) synchronously to be executed. JavaScript cannot perform asynchronous I/O operations. In browser environment, browser handles the async operations. In node environment, async operations are handled by the libuv by using C++. Thread's pool default size is 4, but it can be changed at startup time by setting the UV_THREADPOOL_SIZE environment variable to any value (maximum is 128). thread pool size 4 means 4 requests can get executed at a time, if event demultiplexer has 5 requsts, 4 would be passed to thread pool and 5th would be waiting. Once each request gets executed, result is returned to the `event demultiplexer.

enter image description here

  • When a set of I/O operations completes, the Event Demultiplexer pushes a set of corresponding events into the Event Queue.

enter image description here

handler is the callback. Now event loop keeps an eye on the event queue, if there is something ready, it is pushed to stack to execute the callback. Remember eventually callbacks get executed on stack. Note that some callbacks has priorities on other, the event loop does pick the callbacks based on their priorities.

Hagride answered 27/1, 2022 at 3:55 Comment(0)
B
0

For those who seek short answer and don't want to go to the deepest levels of Node.js internals.

Node.js is not single threaded, it runs on 5 threads by default.

Yes, the only single thread is for actual JavaScript processing, but it always switches from function to function.

It sends SQL query to a database and lets it wait in other thread, while single threaded Node.js continues to compute some other code ready to be computed.

If you wish more explanations, there are good articles about Event Loop, Worker Pool and the whole libuv documentation.

Braxton answered 28/7, 2022 at 20:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.