When is better using clustering or worker_threads?
Asked Answered
C

1

29

I have been reading about multi-processing on NodeJS to get the best understanding and try to get a good performance in heavy environments with my code.

Although I understand the basic purpose and concept for the different ways to take profit of the resources to handle the load, some questions arise as I go deeper and it seems I can't find the particular answers in the documentation.

NodeJS in a single thread:

NodeJS runs a single thread that we call event loop, despite in background OS and Libuv are handling the default worker pool for I/O asynchronous tasks.

We are supossed to use a single core for the event-loop, despite the workers might be using different cores. I guess they are sorted in the end by OS scheduler.

NodeJS as multi-threaded:

When using "worker_threads" library, in the same single process, different instances of v8/Libuv are running for each thread. Thus, they share the same context and communicate among threads with "message port" and the rest of the API.

Each worker thread runs its Event loop thread. Threads are supposed to be wisely balanced among CPU cores, improving the performance. I guess they are sorted in the end by OS scheduler.

Question 1: When a worker uses I/O default worker pool, are the very same threads as other workers' pool being shared somehow? or each worker has its own default worker pool?

NodeJS in multi-processing:

When using "cluster" library, we are splitting the work among different processes. Each process is set on a different core to balance the load... well, the main event loop is what in the end is set in a different core, so it doesn't share core with another heavy event loop. Sounds smart to do it that way.

Here I would communicate with some IPC tactic.

Question 2: And the default worker pool for this NodeJS process? where are they? balanced among the rest of cores as expected in the first case? Then they might be on the same cores as the other worker pools of the cluster I guess. Shouldn't it be better to say that we are balancing main threads (event loops) rather than "the process"?

Being all this said, the main question:

Question 3: Whether is better using clustering or worker_threads? If both are being used in the same code, how can both libraries agree the best performance? or they just can simply get in conflict? or at the end is the OS who takes control?

Cloris answered 20/4, 2020 at 17:58 Comment(0)
J
41

Each worker thread has its own main loop (libuv etc). So does each cloned Node.js process when you use clustering.

Clustering is a way to load-balance incoming requests to your Node.js server over several copies of that server.

Worker threads are a way for a single Node.js process to offload long-running functions to a separate thread, to avoid blocking its own main loop.

Which is better? It depends on the problem you're solving. Worker threads are for long-running functions. Clustering makes a server able to handle more requests, by handling them in parallel. You can use both if you need to: have each Node.js cluster process use a worker thread for long-running functions.

As a first approximation for your decision-making: only use worker threads when you know you have long-running functions.

The node processes (whether from clustering or worker threads) don't get tied to specific cores (or Intel processor threads) on the host machine; the host's OS scheduling assigns cores as needed. The host OS scheduler minimize context-switch overhead when assigning cores to runnable processes. If you have too many active Javascript instances (cluster instances + worker threads) the host OS will give them timeslices according to its scheduling algorithms. Other than avoiding too many Javascript instances, there's very little point in trying second-guess the OS scheduler.

Edit Each Node.js instance, with any worker threads, uses a single libuv thread pool. A main Node.js process shares a single libuv thread pool with all its worker threads. If your Node.js program uses many worker threads, you may, or may not, need to set the UV_THREADPOOL_SIZE environment variable to a value greater than the default 4.

Node.js's cluster functionality uses the underlying OS's fork/exec scheme to create a new OS process for each cluster instance. So, each cluster instance has its own libuv pool.

If you're running stuff at scale, lets say with more than ten host machines running your Node.js server, then you can spend time optimizing Javascript instances.

Don't forget nginx if you use it as a reverse proxy to handle your https work. It needs some processor time too, but it uses fine-grain multithreading so you won't have to worry about it unless you have huge traffic.

Justen answered 20/4, 2020 at 20:11 Comment(6)
Thanks, @O. Jones. Still wonder how the default working pool is assigned, like if each main thread goes with its own pool or it is somehow shared.Cloris
So, please, tell me if I understood right your edit: A user created worker thread has its own pool. And and the main thread in a new process in the cluster has its own pool as well. So we can state that any main thread (event loop) from any origin has its own pool of threads in background handled by its own libuv instance. Am I right?Cloris
On the worker threads, they share a pool with the main thread that created them.Justen
I don't understand then. In worker threads, each thread has its own instance of V8 and Libuv. Libuv is the one that handles worker pool. How can this pool be shared among different Libuv instances? Or where am I wrong?Cloris
"A main nodejs process shares a single libuv thread pool with all its worker threads." is NOT true. It also contradicts what you said earlier, i.e. "Each worker thread has its own main loop (libuv etc)."Haerr
Reading your answer made me confused. At the beginning you say worker threads have their own thread pool and at the end you say they share a single thread pool.Despain

© 2022 - 2024 — McMap. All rights reserved.