Node.js single-thread mechanism
Asked Answered
G

2

2

I learnt Node.js is single-threaded and non-blocking. Here I saw a nice explanation How, in general, does Node.js handle 10,000 concurrent requests? But the first answer says

The seemingly mysterious thing is how both the approaches above manage to run workload in "parallel"? The answer is that the database is threaded. So our single-threaded app is actually leveraging the multi-threaded behaviour of another process: the database.

(1) which gets me confused. Take a simple express application as an example, when I

var monk = require('monk');
var db = monk('localhost:27017/databaseName');

router.get('/QueryVideo', function(req, res) { 
var collection = db.get('videos'); 
collection.find({}, function(err, videos){
    if (err) throw err;  
res.render('index', { videos: videos })
   });
});

And when my router responds multi requests by doing simple MongoDB query. Are those queries handled by different threads? I know there is only one thread in node to router client requests though.

(2) My second question is, how does such a single-threaded node application ensure security? I don't know much about security but it looks like multi requests should be isolated (at least different memory space?) Even though multi-threaded applications can't ensure security, because they still share many things. I know this may not be a reasonable question, but in today's cloud service, isolation seems to be an important topic. I am lost in topics such as serverless, wasm, and wasm based serverless using Node.js environment.

Thank you for your help!!

Gallfly answered 12/1, 2021 at 3:58 Comment(1)
1. a db can use many threads if needed. 2. js has reference-based security, and without as many of the low-level gotchas like timing attacks or overflows, due to it's dynamic and unpredictable nature.Precautionary
W
1

Since you asked about my answer I guess I can help clarify.

1

For the specific case of handling multiple parallel client requests which triggers multiple parallel MongoDB queries you asked:

Are those queries handled by different threads?

On node.js since MongoDB connects via the network stack (tcp/ip) all parallel requests are handled in a single thread. The magic is a system API that allows your program to wait in parallel. Node.js uses libuv to select which API to use depending on which OS at compile time. But which API does not matter. It is enough to know that all modern OSes have APIs that allow you to wait on multiple sockets in parallel (instead of the usual waiting for a single socket in multiple threads/processes). These APIs are collectively called asynchronous I/O APIs.

On MongoDB.. I don't know much about MongoDB. Mongo may be implemented in multiple threads or it may be singlethreaded like node.js. Disk I/O are themselves handled in parallel by the OS without using threads but instead use I/O channels (eg, PCI lanes) and DMA channels. Basically both threads/processes and asynchronous I/O are generally implemented by the OS (at least on Linux and Mac) using the same underlying system: OS events. And OS events are just functions that handle interrupts. Anyway, this is straying quite far from the discussion about databases..

I know that MySQL and Postgresql are both multithreaded to handle parsing the SQL query loop (query processing in SQL are basically operations that loop through rows and filter the result - this requires both I/O and CPU which is why they're multithreaded)

If you are still curious how computers can do things (like wait for I/O) without the CPU executing a single instruction you can check out my answers to the following related questions:

Is there any other way to implement a "listening" function without an infinite while loop?

What is the mechanism that allows the scheduler to switch which threads are executing?

2

Security is ensured by the language being interpreted and making sure the interpreter does not have any stack overflow or underflow bugs. For the most part this is true for all modern javascript engines. The main mechanism to inject code and execute foreign code via program input is via buffer overflow or underflow. Being able to execute foreign code then allows you to access memory. If you cannot execute foreign code being able to access memory is kind of moot.

There is a second mechanism to inject foreign code which is prevalent in some programming language cultures: code eval (I'm looking at you PHP!). Using string substitution to construct database queries in any language open you up to sql code eval attack (more commonly called sql injection) regardless of your program's memory model. Javascript itself has an eval() function. To protect against this javascript programmers simply consider eval evil. Basically protection against eval is down to good programming practices and Node.js being open source allows anyone to look at the code and report any cases where code evaluation attack is possible. Historically Node.js has been quite good in this regards - so your main guarantee about security from code eval is Node's reputation.

Weems answered 12/1, 2021 at 5:15 Comment(9)
Thank you! My understanding is now: (1) at node side, queries are sent from the multiple sockets but within the same node main thread, as for how a database server handles queries, it depends, it could use multi threads. (2) security of code is checked at interpreting (JIT) stage? I understand the eval attack which is string attack from the client request, but is it possible that node server code itself is not safe? I mean the code handles the users requests. Anyway, the server code could be something other than db query.Gallfly
I saw a paper implemented a WebAssembly based Function as a service in Node environment. We know node can execute wasm function. It looks like wasm is memory safe, but they still uses a npm library called vm2 link to achieve context based isolation. Is context same with thread? If yes, they implemented a multi-thread node app, I think this may brings time benefits other than security, coz non-blocking single-thread application is not suitable for CPU-intensive tasks. paper linkGallfly
Wasm is a full subset of javascript that includes the newer binary data stuff like typed arrays etc to make it easier for the JIT to generate binary code. As such there is nothing specifically about wasm that makes it non-javascript. However the vm2 module is different. It allows you to instantiate a javascript interpreter instance javascript. From the docs it does not appear that vm2 runs in a separate thread. Instead the C++ object that holds the vm2 javascript engine is a different object from the C++ object that holds the node.js javascript engine. This is what they mean by "context"Weems
Since the two javascript engines are basically different objects there is no way that javascript in one engine can access data in another engine (kind of like Firefox cannot access data in Google Chrome because they are different engines - except in the case of vm2 both node.js and vm2 run in the same process/app/program)Weems
Thank you! I still don't understand what context is, it says Only JavaScript built-in objects + Buffer are available. I guess it means different code in different context still run in the same thread, but to ensure safety, only isolated built-in objects and buffer are available? I agree that wasm is nothing different, but I think compared with using dynamic C libraries in node, the only benefit of using wasm is its security. Otherwise, why not choose ddl + context isolation. Anyway, they are trying to push the wasm to the server side.Gallfly
The javascript interpreter is implemented in C++ so the interpreter is generally an object with methods to load and execute javascript functions. Node.js implements a javascript object per thread from a V8 class (V8 is the javascript library that node.js and Google Chrome uses to execute javascript code). The vm2 library also creates an object from a javascript engine class though I'm not sure if they reuse the V8 library in node.js but with different configurations or they use a different javascript engine.Weems
Unlike node.js which creates only one javascript interpreter in each thread (you can create your own thread which will automatically have a javascript interpreter using the worker_threads module) vm2 allows you to create additional javascript interpreter on the same threadWeems
Thanks again! (1) my previous question could be misleading because I am now not sure if different interpreters run on the same thread or process with vm2's help? vm2 github page mentions process but no information on thread. (2) I wonder if this understanding is correct: with vm2, we can create multi interpreter objects (>2?) within the same thread (or process) and each interpreter object is called a context, each tenant's code runs in a context, so we are actually doing context switch to concurrently while safely execute multi tenants' code?Gallfly
I found this might be related, link, that person asks I can only new NodeVM in a worker thread per request, inside which I cannot reuse a vm instance and the cost hurts. Then, it looks like vm2 is run in a thread again ( ummm, confusing... Not sure if a worker thread is an OS thread or processGallfly
T
2

(1) The big picture goes like this; for nodejs there are 2 types of thread: event (single) and workers (pool). So long you don't block the event loop, after nodejs placed the blocked I/O call to worker thread; nodejs goes on to service next request. The worker will place the completed I/O back to the event loop for next course of action.

In short the main thread: "Do something else when it need to wait, come back and continue when the wait is over, and it does this one at a time".

And this reactive mechanism has nothing to do with thread running in another process (ie database). The database may deploy other type of thread management scheme.

(2) The 'memory space' in your question is in the same process space. A thread belongs to a process (ie Express app A) never run in other process (ie Fastify app B) space.

Tetracycline answered 12/1, 2021 at 4:31 Comment(3)
Thank you! In this answer link There is a hrbid approach: Each thread runs an event loop simultaneously processing requests single-threaded but requests are load-balanced among multiple threads. Is this what you mean? Loading balance those blocked I/O call to a different thread?Gallfly
That's a way to extend reactive design, basically it tries to move more calls in the main thread to another layer of (event) threads. As lucrative as is, however, you need to check if such design is supported by nodejs directly. Nevertheless, it is fairy straight forward to run your nodejs in cluster mode, here's one common but certainly not the only way: pm2.io/blog/2018/04/20/Node-js-clustering-made-easy-with-PM2Guadalcanal
Hi! I found a js library called TAGG2 which does job like what you have said, but when we write a nodejs application, we do not explicitly include and use this module. Will node automatically do this for us?Gallfly
W
1

Since you asked about my answer I guess I can help clarify.

1

For the specific case of handling multiple parallel client requests which triggers multiple parallel MongoDB queries you asked:

Are those queries handled by different threads?

On node.js since MongoDB connects via the network stack (tcp/ip) all parallel requests are handled in a single thread. The magic is a system API that allows your program to wait in parallel. Node.js uses libuv to select which API to use depending on which OS at compile time. But which API does not matter. It is enough to know that all modern OSes have APIs that allow you to wait on multiple sockets in parallel (instead of the usual waiting for a single socket in multiple threads/processes). These APIs are collectively called asynchronous I/O APIs.

On MongoDB.. I don't know much about MongoDB. Mongo may be implemented in multiple threads or it may be singlethreaded like node.js. Disk I/O are themselves handled in parallel by the OS without using threads but instead use I/O channels (eg, PCI lanes) and DMA channels. Basically both threads/processes and asynchronous I/O are generally implemented by the OS (at least on Linux and Mac) using the same underlying system: OS events. And OS events are just functions that handle interrupts. Anyway, this is straying quite far from the discussion about databases..

I know that MySQL and Postgresql are both multithreaded to handle parsing the SQL query loop (query processing in SQL are basically operations that loop through rows and filter the result - this requires both I/O and CPU which is why they're multithreaded)

If you are still curious how computers can do things (like wait for I/O) without the CPU executing a single instruction you can check out my answers to the following related questions:

Is there any other way to implement a "listening" function without an infinite while loop?

What is the mechanism that allows the scheduler to switch which threads are executing?

2

Security is ensured by the language being interpreted and making sure the interpreter does not have any stack overflow or underflow bugs. For the most part this is true for all modern javascript engines. The main mechanism to inject code and execute foreign code via program input is via buffer overflow or underflow. Being able to execute foreign code then allows you to access memory. If you cannot execute foreign code being able to access memory is kind of moot.

There is a second mechanism to inject foreign code which is prevalent in some programming language cultures: code eval (I'm looking at you PHP!). Using string substitution to construct database queries in any language open you up to sql code eval attack (more commonly called sql injection) regardless of your program's memory model. Javascript itself has an eval() function. To protect against this javascript programmers simply consider eval evil. Basically protection against eval is down to good programming practices and Node.js being open source allows anyone to look at the code and report any cases where code evaluation attack is possible. Historically Node.js has been quite good in this regards - so your main guarantee about security from code eval is Node's reputation.

Weems answered 12/1, 2021 at 5:15 Comment(9)
Thank you! My understanding is now: (1) at node side, queries are sent from the multiple sockets but within the same node main thread, as for how a database server handles queries, it depends, it could use multi threads. (2) security of code is checked at interpreting (JIT) stage? I understand the eval attack which is string attack from the client request, but is it possible that node server code itself is not safe? I mean the code handles the users requests. Anyway, the server code could be something other than db query.Gallfly
I saw a paper implemented a WebAssembly based Function as a service in Node environment. We know node can execute wasm function. It looks like wasm is memory safe, but they still uses a npm library called vm2 link to achieve context based isolation. Is context same with thread? If yes, they implemented a multi-thread node app, I think this may brings time benefits other than security, coz non-blocking single-thread application is not suitable for CPU-intensive tasks. paper linkGallfly
Wasm is a full subset of javascript that includes the newer binary data stuff like typed arrays etc to make it easier for the JIT to generate binary code. As such there is nothing specifically about wasm that makes it non-javascript. However the vm2 module is different. It allows you to instantiate a javascript interpreter instance javascript. From the docs it does not appear that vm2 runs in a separate thread. Instead the C++ object that holds the vm2 javascript engine is a different object from the C++ object that holds the node.js javascript engine. This is what they mean by "context"Weems
Since the two javascript engines are basically different objects there is no way that javascript in one engine can access data in another engine (kind of like Firefox cannot access data in Google Chrome because they are different engines - except in the case of vm2 both node.js and vm2 run in the same process/app/program)Weems
Thank you! I still don't understand what context is, it says Only JavaScript built-in objects + Buffer are available. I guess it means different code in different context still run in the same thread, but to ensure safety, only isolated built-in objects and buffer are available? I agree that wasm is nothing different, but I think compared with using dynamic C libraries in node, the only benefit of using wasm is its security. Otherwise, why not choose ddl + context isolation. Anyway, they are trying to push the wasm to the server side.Gallfly
The javascript interpreter is implemented in C++ so the interpreter is generally an object with methods to load and execute javascript functions. Node.js implements a javascript object per thread from a V8 class (V8 is the javascript library that node.js and Google Chrome uses to execute javascript code). The vm2 library also creates an object from a javascript engine class though I'm not sure if they reuse the V8 library in node.js but with different configurations or they use a different javascript engine.Weems
Unlike node.js which creates only one javascript interpreter in each thread (you can create your own thread which will automatically have a javascript interpreter using the worker_threads module) vm2 allows you to create additional javascript interpreter on the same threadWeems
Thanks again! (1) my previous question could be misleading because I am now not sure if different interpreters run on the same thread or process with vm2's help? vm2 github page mentions process but no information on thread. (2) I wonder if this understanding is correct: with vm2, we can create multi interpreter objects (>2?) within the same thread (or process) and each interpreter object is called a context, each tenant's code runs in a context, so we are actually doing context switch to concurrently while safely execute multi tenants' code?Gallfly
I found this might be related, link, that person asks I can only new NodeVM in a worker thread per request, inside which I cannot reuse a vm instance and the cost hurts. Then, it looks like vm2 is run in a thread again ( ummm, confusing... Not sure if a worker thread is an OS thread or processGallfly

© 2022 - 2024 — McMap. All rights reserved.