Highest performance way to fork/spawn many node.js processes from parent process

Asked 26/11, 2016 at 9:16 Answered 26/11, 2016 at 9:41

I am using Node.js to spawn upwards of 100 child processes, maybe even 1000. What concerns me is that the parent process could become some sort of bottleneck if all the stdout/stderr of the child processes has to go through the parent process in order to get logged somewhere.

So my assumption is that in order to achieve highest performance/throughput, we should ignore stdout/stderr in the parent process, like so:

const cp = require('child_process');

items.forEach(function(exec){

   const n = cp.spawn('node', [exec], {
      stdio: ['ignore','ignore','ignore','ipc']
   });

});

My question is, how much of a performance penalty is it to use pipe in this manner:

// (100+ items to iterate over)

items.forEach(function(exec){

   const n = cp.spawn('node', [exec], {
      stdio: ['ignore','pipe','pipe','ipc']
   });

});

such that stdout and stderr are piped to the parent process? I assume the performance penalty could be drastic, especially if we handle stdout/stderr in the parent process like so:

     // (100+ items to iterate over)

    items.forEach(function(exec){

       const n = cp.spawn('node', [exec], {
          stdio: ['ignore','pipe','pipe','ipc']
       });

       n.stdout.setEncoding('utf8');
       n.stderr.setEncoding('utf8');

        n.stdout.on('data', function(d){
          // do something with the data
        });

        n.stderr.on('data', function(d){
          // do something with the data
        });

    });

I am assuming

I assume if we use 'ignore' for stdout and stderr in the parent process, that this is more performant than piping stdout/stderr to parent process.
I assume if we choose a file to stream stdout/stderr to like so

stdio: ['ignore', fs.openSync('/some/file.log'), fs.openSync('/some/file.log'),'ipc']

that this is almost as performant as using 'ignore' for stdout/stderr (which should send stdout/stderr to /dev/null)

Are these assumptions correct or not? With regard to stdout/stderr, how can I achieve highest performance, if I want to log the stdout/stderr somewhere (not to /dev/null)?

Note: This is for a library so the amount of stdout/stderr could vary quite a bit. Also, most likely will rarely fork more processes than there are cores, at most running about 15 processes simultaneously.

Insipience answered 26/11, 2016 at 9:16 Comment(13)

If the source comes into question, part of the answers are here: github.com/nodejs/node/blob/master/lib/child_process.js – Insipience 26/11, 2016 at 10:25

and here: github.com/nodejs/node/blob/master/lib/internal/… – Insipience 26/11, 2016 at 10:29

What does your library do that requires forking that many child processes? – Complicate 26/11, 2016 at 12:21

Testing library, similar to Node.js' AVA. All the child-processes have to run on one machine, for the moment. I don't expect to utilize multiple machines, anytime soon. – Insipience 26/11, 2016 at 19:53

Each test runs in a separate process for speed/isolation – Insipience 26/11, 2016 at 19:55

Have you considered limiting the number of child processes to (at most) the number of cores? Starting 100's of processes running tests will most likely pose a CPU bottleneck, not an I/O one. – Complicate 26/11, 2016 at 19:58

yep, that's what I do, I give the user the choice of choosing --concurrency but by default it's basically the number cores. So maybe the parent won't be such a bottleneck in this case. – Insipience 26/11, 2016 at 20:0

I wonder if you're not focussing on an issue (optimizing I/O in case the user opts for a high concurrency) that becomes moot when the number of cores is too low to keep up with all the started processes to begin with. I assume that each test process is pretty active, requiring a fair amount of CPU time. – Complicate 26/11, 2016 at 20:4

Well how would you suggest I optimize I/O in the case the user chooses to run a lot of processes in parallel? (>> the number of cores). Are you talking about stdout/stderr? That's ultimately what this question is about. – Insipience 26/11, 2016 at 20:23

What I'm trying to say is that optimizing I/O in the case of 100's of processes probably won't do any good, as the bottleneck in that case is that there just aren't enough CPU cores to run all those processes concurrently. Considering that Node itself can handle 100's or 1000's of network connections just fine, I doubt that handling that many file descriptors will be an actual issue. It's the sheer number of processes that will be. – Complicate 26/11, 2016 at 20:29

I see what you're saying - btw I personally have been wondering how CPU intensive piping stdout/stderr is, for a long time. There must be some basic benchmark, one "unit" of stdout or stderr being piped from once process to another or to a file, how many CPU cycles. – Insipience 26/11, 2016 at 20:33

Without having checked it, my guess would be that they won't be very CPU intensive at all. Passing data from file descriptor to file descriptor in modern OS'es is heavily optimized, and stdout/stderr aren't very special in that regard. – Complicate 26/11, 2016 at 20:40

Ok, so basically my takeaway from this thread is that since I most like won't be running much more than 15 processes at a time, that piping stdout/stderr to parent process will really not be that big of a deal, and I definitely don't expect there to be that much stdout/stderr, ever – Insipience 26/11, 2016 at 21:14

You have the following options:

you can have the child process completely ignore stdout/stderr, and do logging on its own by any other means (log a to a file, syslog...)
if you're logging the output of your parent process, you can set stdout/stderr to process.stdout and process.stderr respectively. This means the output of the child will be the same as the main process. Nothing will flow through the main process
you can set file descriptors directly. This means the output of the child process will go to the given files, without going through the parent process
however, if you don't have any control over the child processes AND you need to somehow do something to the logs (filter them, prefix them with the associated child process, etc.), then you probably need to go through the parent process.

As we have no idea of the volume of logs you're talking about, we have no idea whether this is critical or just premature optimisation. Node.js being asynchronous, I don't expect your parent process becoming a bottleneck unless it's really busy and you have lots of logs.

Smock answered 26/11, 2016 at 9:41 Comment(8)

I do have control over the child processes to a certain extent - however the volume of logging could vary tremendously, depending on the user, and in some cases I assume a lot of stdout/stderr could be logged – Insipience 26/11, 2016 at 9:52

How much is "a lot"? – Smock 26/11, 2016 at 9:54

I don't know, but if you had 100 processes all sending mega amounts of stdout/stderr to a single process and that process had to handle it, I assume it would be slower than if those 100 processes each independently sent their stdout/stderr to separate files – Insipience 26/11, 2016 at 9:57

I am looking to gain that 10%-30% performance that might come with optimizing this somehow – Insipience 26/11, 2016 at 9:58

WRT to "if you're logging the output of your parent process, you can set stdout/stderr to process.stdout and process.stderr respectively. This means the output of the child will be the same as the main process. Nothing will flow through the main process", I feel like in this case that it would slow down the parent process...right? Unless node.js intelligently does some lower level connecting of pipes behind the scenes. – Insipience 26/11, 2016 at 9:59

No, if you set stdout to process.stdout (and the same for stderr), nothing flows through the parent. The parent gives the file descriptor it's using (file, pipe, terminal...) and gives it to the child which uses it directly. If the main process's stdout points to a file, this is exactly the same as opening the file and sending the file descriptor: everything is handled directly by the kernel. – Smock 26/11, 2016 at 10:12

you mean like this right: stdio: ['ignore', process.stdout, process.stderr, 'ipc'] ? – Insipience 26/11, 2016 at 10:15

if so, that should be the same as stdio: ['ignore', 'inherit', 'inherit', 'ipc'] , TMK – Insipience 26/11, 2016 at 10:16

Are these assumptions correct or not?

how can I achieve highest performance?

Test it. That's how you can achieve the highest performance. Test on the same type of system you will use in production, with the same number of CPUs and similar disks (SSD or HDD).

I assume your concern is that the children might become blocked if the parent does not read quickly enough. That is a potential problem, depending on the buffer size of the pipe and how much data flows through it. However, if the alternative is to have each child process write to disk independently, this could be better, the same, or worse. We don't know for a whole bunch of reasons, starting with the fact that we have no idea how many cores you have, how quickly your processes produce data, and what I/O subsystem you're writing to.

If you have a single SSD you might be able to write 500 MB per second. That's great, but if that SSD is 512 GB in size, you'll only last 16 minutes before it is full! You'll need to narrow down the problem space a lot more before anyone can know what's the most efficient approach.

If your goal is simply to get logged data off the machine with as little system utilization as possible, your best bet is to directly write your log messages to the network.

Grandfatherly answered 26/11, 2016 at 9:27 Comment(8)

Well, this will run on all sorts of systems because this is for a library. So perhaps we can assume to a certain extent that "all things being equal" except the variable in question - which is really whether it is a big penalty to pipe the stdout/stderr of the children to the parent versus just piping it to /dev/null or a file, I just don't know enough about computers to be sure one way or the other. – Insipience 26/11, 2016 at 9:30

I could test it on my system, but it won't the same as the next person who runs it on theirs, I just want to know the average case / all other things being equal/fixed case – Insipience 26/11, 2016 at 9:32

@AlexanderMills You should have said you're writing a library! That's critically important and you didn't mention it at all! Tell us more about it...is it for internal use on one project only, internal use on many projects in a company where you work, or for use by people you may never interact with? If the latter, it might be reasonable to make this configurable if you can't test which way is better. – Grandfatherly 26/11, 2016 at 9:32

Thanks yes, I was definitely planning on making it configurable - basically - the user would have a few options including (1) the parent process should just inherit the stdio from the children so that it's all logged in the original terminal, or (2) to send the stdio to a file, or (3) send all stdio to /dev/null – Insipience 26/11, 2016 at 9:35

The OP really concerns whether a 4th option,(4) send stdio to file and pipe it to the parent, is performant or not, I am afraid that it won't be – Insipience 26/11, 2016 at 9:36

But actually, there is a hangup with that (4)th option, I actually don't know how to send the stdio to a log file and pipe it to the parent, I have been researching how to do that – Insipience 26/11, 2016 at 9:39

What does it mean, "send stdio to file and pipe it to the parent"? – Grandfatherly 26/11, 2016 at 9:41

tee stdio. send the stdio of the child process to both the parent process and to a file...right? Sorry, I think it would have been clearer to say "both send stdio to file and pipe it to the parent" – Insipience 26/11, 2016 at 9:50

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags