NodeJS batch multi processing - child processes in a pool (or multithreading)
Asked Answered
J

4

10

NodeJS batch multi threading processing - child processes in a pool.

I know a child process is a process, not a thread. I used wrong semantics, because most people know what your intent is when you speak of "multithreading". So I'll keep it in the title.


Imagine a scenario where you continuously have multiple similar and complex things to do using a single custom function or module. It makes a lot of sense to use all your available cores/threads (e.g. 8/16), which is what child_process.fork() is for.

Ideally, you are going to want a number of simultaneous workers and send/callback messages to/from one controller.

node-cpool, fork-pool, child-pool are some modules that do exactly this, but they seem old/unmaintained/impopular.

There are a ton of similar-ish modules, but these seem the most relevant. What they all have in common is a couple of commits, hardly starred, hardly forked, and abandoned.

What is usually the case when I can't find something for a task that seems like something that makes sense in every way, is that there is an even better way that I am missing. Hence my question.

How do I have a managed, queued, multithreaded pool of parallel fork()s for my custom module that does some CPU intensive work?

Multithreaded modules like TAGG and webworker-threads are not the same because they don't support full modules (with binary compiled components).


PS

I am now using fork-pool which seems to do exactly what I want, with some quirks, but I can't believe that such an unknown and impopular module would be the only viable option here.

Journalistic answered 18/9, 2014 at 11:43 Comment(2)
child_process.fork() creates a process not a thread. Forking a process for each CPU intensive task just doesn't seem to be right. node.js (without the help of third-party native modules) provides a single threaded environment for IO intensive applications. You could probably write a polyglot application and use a message-queue in order to offload CPU intensive tasks to a multi-threaded environment.Emu
Sorry, I didn't take care of the semantics in my question properly. I know fork() creates a process. The fact is, processes and threads both execute code independently in one of those "threads per core" hardware channels. Just assume for this question that I know what I am doing and running 8 processes is fine. I never heard anyone complain about a piece of software running 2 times as fast. Node provides a multi threaded toolset in the form of child_process precisely for people who need it.Journalistic
N
1

I'd like to provide an option that doesn't exactly answer your question but might be useful in a situation similar to yours when there is flexibility in choosing technologies.

If offloading work to the .NET environment (C#, F#, IronPython, PowerShell, etc.) is acceptable, you might be interested in the Edge.js project.

This way you can use Node for IO-intensive work and delegate the computationally-intensive work to the .NET runtime hosted inside the same process. Edge.js provides efficient interoperability with .NET code, allowing to take advantage of the .NET Task Parallel Library and other capabilities without overhead of spawning additional processes.


There are maintenance and technical costs involved whenever you make hybrid apps. Carefully evaluate what you gain and make sure it is worth the costs according to your project's priorities.

Node.js is not well suited to executing blocking, CPU bound workloads. The distinguishing design trait of node.js is its single-threaded, event-loop based architecture.

Node.js applications typically process CPU-bound workloads by delegating the processing to an external process or service. This involves crossing the process boundary and incurs additional latency. (more)

It is silly to introduce complexity without a good reason. If Node.js can handle the task itself, adding a dependency on .NET is probably overkill. However, there are many tasks when it could be valuable. Being a good engineer requires some thinking.

Nitro answered 18/9, 2014 at 14:42 Comment(6)
That's seriously overkill. Suggesting .NET as an alternative to using some sort of queue is just silly, especially since you'd still need a queue to pass messages to/from .NET. Additionally, they may be running on OSX or Linux, making .NET an even worse suggestion.Fougere
@Jessie: I strongly disagree in general while agree that it might be overkill for a particular situation. It depends on many factors, including how much computationally intensive is the task. There is a large class of problems that would benefit from this approach. Also, why would Linux/OSX be a problem?Nitro
I do agree that it is dependent on other factors. As for Linux/OSX support, C#, F#, PowerShell, et. all are not very well supported on platforms other than Windows. Java, C, C++, etc. would likely better options for cross platform development.Fougere
I'm with @JessieA.Morris here. If you're a .Net shop and want to dabble with NodeJS, that's not a bad idea, but if you're trying to do NodeJS you do not want to introduce an enormous dependency like this. It adds way too much complexity.Gae
"Silly" is putting it mildly. Just as introducing Java (JVM) or Haskell would be ridiculous unless they offered something NodeJS itself couldn't do. Even then I'd advocate writing the solution in Java or Haskell if they offered better tooling, not fusing together two platforms.Gae
This is a valuable option, so I've upvoted the answer. However, in my particular case, I'm interested in pooling processes because they can be anything. Any binary component (C++ or even assembly) can be wrapped into a module. If they are stand-alone binaries they can be wrapped in spawn() in stead of fork(), although in my specific case I am looking to multi-process some node modules with a compiled binary component. (Also, I'm on Linux)Journalistic
F
1

I would suggest using something like Redis as your Queue. Here's a tutorial of creating a message bus in Node with Redis and Kue. This will scale pretty well and allow you to have multiple processes, threads, or even machines producing and consuming items to/from the queue.

Fougere answered 18/9, 2014 at 14:59 Comment(1)
Sounds interesting, upvoted. But this also looks a bit like 'too much work and dependencies'. I just want to pool some processes. That shouldn't need to much code and too many dependencies. I am now using fork-pool which seems to do exactly what I want, but I can't believe that such an unknown and impopular module would be the only viable option here.Journalistic
G
1

The Web Workers standard defines a way for JavaScript to use multiple threads and do a lot more work in parallel than a single thread can manage.

There are several implementations of this for NodeJS including the webworker-threads NPM module.

Using fork is going down a multi-process path that is usually far more difficult to coordinate. The NodeJS Cluster system attempts to alleviate a lot of the friction here but is far from ideal.

Gae answered 18/9, 2014 at 16:21 Comment(2)
Oh, I forgot about Web Workers. That's a great solution.Fougere
Thanks, but I mentioned these in the last part of my question. They cover an implementation that only works with pure evaluatable (eval()) javascript. A module with a binary component (as are most effcient modules like parsers (xml-expat)) can't ever be used in this way. Javascript threading is too limited. Fork() is what is needed in certain (this) situation. It just adds 30ms delay in execution time and 10MB of memory use per process, something the developer would probably have a good reason to opt for.Journalistic
W
1

I recently had this issue with pooling of forks created off of a single node.js process and came up with my own solution to resolve the issue. I've finally managed to export the solution to its own npm module which you can check out here:

https://www.npmjs.com/package/forkpool

You can create a single pool to manage all your forks or can create multiple pools to manage isolated batches of work. For example, one of my applications has two pools - one for managing forks related to image processing and another for video processing. Since video processing is more intensive than image processing, size of the video processing pool is of 2 while the one with image processing has size of 4 on a 8-core machine.

I'm hoping to continuously improve this module over time so feel free to raise issues or enhancement requests on the Github repository:

https://github.com/manthanhd/forkpool

Worship answered 30/3, 2016 at 16:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.