Your doubts make perfect sense. It's been few years since you asked this question but I think it's worth to add few thinks to the existing answers.
Run an array of functions in parallel, without waiting until the previous function has completed. If any of the functions pass an error to its callback...
This sentence is not entirely correct. In fact it does wait for each function to have completed because it's impossible not to do so in JavaScript. Both function calls and function returns are synchronous and blocking. So when it calls any function it has to wait for it to return. What it doesn't have to wait for is the calling of the callback that was passed to that function.
Allegory
Some time ago I wrote a short story to demonstrate that very concept:
To quote a part of it:
“So I said: ‘Wait a minute, you tell me that one cake takes three and a half hours and four cakes take only half an hour more than one? It doesn’t make any sense!’ I though that she must be kidding so I started laughing.”
“But she wasn’t kidding?”
“No, she looked at me and said: ‘It makes perfect sense. This time is mostly waiting. And I can wait for many things at once just fine.’ I stopped laughing and started thinking. It finally started to get to me. Doing four pillows at the same time didn’t buy you any time, maybe it was arguably easier to organize but then again, maybe not. But this time it was something different. But I didn’t really know how to use that knowledge yet.”
Theory
I think it's important to emphasize that in single-threaded event loops you can never do more than one thing at once. But you can wait for many things at once just fine. And this is what happens here.
The parallel function from the Async module calls each of the function one by one, but each function has to return before the next one can be called, there is no way around it. The magic here is that the function doesn't really do its job before it returns - it just schedules some task, registers an event listener, passes some callback somewhere else, adds a resolution handler to some promise etc.
Then, when the scheduled task finishes, some handler that was previously registered by that function is executed, this in turns executes the callback that was originally passed by the Async module and the Async module knows that this one function has finished - this time not only in a sense that it returned, but also that the callback that was passed to it was finally called.
Examples
So, for example let's say that you have 3 functions that download 3 different URLs: getA()
, getB()
and getC()
.
We will write a mock of the Request module to simulate the requests and some delays:
function mockRequest(url, cb) {
const delays = { A: 4000, B: 2000, C: 1000 };
setTimeout(() => {
cb(null, {}, 'Response ' + url);
}, delays[url]);
};
Now the 3 functions that are mostly the same, with verbose logging:
function getA(cb) {
console.log('getA called');
const url = 'A';
console.log('getA runs request');
mockRequest(url, (err, res, body) => {
console.log('getA calling callback');
cb(err, body);
});
console.log('getA request returned');
console.log('getA returns');
}
function getB(cb) {
console.log('getB called');
const url = 'B';
console.log('getB runs request');
mockRequest(url, (err, res, body) => {
console.log('getB calling callback');
cb(err, body);
});
console.log('getB request returned');
console.log('getB returns');
}
function getC(cb) {
console.log('getC called');
const url = 'C';
console.log('getC runs request');
mockRequest(url, (err, res, body) => {
console.log('getC calling callback');
cb(err, body);
});
console.log('getC request returned');
console.log('getC returns');
}
And finally we're calling them all with the async.parallel
function:
async.parallel([getA, getB, getC], (err, results) => {
console.log('async.parallel callback called');
if (err) {
console.log('async.parallel error:', err);
} else {
console.log('async.parallel results:', JSON.stringify(results));
}
});
What gets displayed immediately is this:
getA called
getA runs request
getA request returned
getA returns
getB called
getB runs request
getB request returned
getB returns
getC called
getC runs request
getC request returned
getC returns
As you can see this is all sequential - functions get called one by one and the next one is not called before the previous one returns. Then we see this with some delays:
getC calling callback
getB calling callback
getA calling callback
async.parallel callback called
async.parallel results: ["Response A","Response B","Response C"]
So the getC
finished first, then getB
and getC
- and then as soon as the last one finishes, the async.parallel
calls our callback with all of the responses combined and in correct order - in the order that the function was ordered by us, not in the order that those requests finished.
Also we can see that the program finishes after 4.071 seconds which is roughly the time that the longest request took, so we see that the requests were all in progress at the same time.
Now, let's run it with async.parallelLimit
with the limit of 2 parallel tasks at most:
async.parallelLimit([getA, getB, getC], 2, (err, results) => {
console.log('async.parallel callback called');
if (err) {
console.log('async.parallel error:', err);
} else {
console.log('async.parallel results:', JSON.stringify(results));
}
});
Now it's a little bit different. What we see immediately is:
getA called
getA runs request
getA request returned
getA returns
getB called
getB runs request
getB request returned
getB returns
So getA
and getB
was called and returned but getC
was not called at all yet. Then after some delay we see:
getB calling callback
getC called
getC runs request
getC request returned
getC returns
which shows that as soon as getB
called the callback the Async module no longer has 2 tasks in progress but just 1 and can start another one, which is getC
, and it does so immediately.
Then with another delays we see:
getC calling callback
getA calling callback
async.parallel callback called
async.parallel results: ["Response A","Response B","Response C"]
which finishes the whole process just like in the async.parallel
example. This time the whole process also took roughly 4 seconds because the delayed calling of getC
didn't make any difference - it still managed to finish before the first called getA
finished.
But if we change the delays to those ones:
const delays = { A: 4000, B: 2000, C: 3000 };
then the situation is different. Now async.parrallel
takes 4 seconds but async.parallelLimit
with the limit of 2 takes 5 seconds and the order is slightly different.
With no limit:
$ time node example.js
getA called
getA runs request
getA request returned
getA returns
getB called
getB runs request
getB request returned
getB returns
getC called
getC runs request
getC request returned
getC returns
getB calling callback
getC calling callback
getA calling callback
async.parallel callback called
async.parallel results: ["Response A","Response B","Response C"]
real 0m4.075s
user 0m0.070s
sys 0m0.009s
With a limit of 2:
$ time node example.js
getA called
getA runs request
getA request returned
getA returns
getB called
getB runs request
getB request returned
getB returns
getB calling callback
getC called
getC runs request
getC request returned
getC returns
getA calling callback
getC calling callback
async.parallel callback called
async.parallel results: ["Response A","Response B","Response C"]
real 0m5.075s
user 0m0.057s
sys 0m0.018s
Summary
I think the most important thing to remember - no matter if you use callbacks like in this case, or promises or async/await, is that in single-threaded event loops you can do only one thing at once, but you can wait for many things at the same time.