Node.js sync vs. async
Asked Answered
D

5

13

I'm currently learning node.js and I see 2 examples for sync and async program (same one).

I do understand the concept of a callback, but i'm trying to understand the benefit for the second (async) example, as it seems that the two of them are doing the exact same thing even though this difference...

Can you please detail the reason why would the second example be better? I'll be happy to get an ever wider explanation that would help me understand the concept..

Thank you!!

1st example:

var fs = require('fs');

function calculateByteSize() {
    var totalBytes = 0,
        i,
        filenames,
        stats;
    filenames = fs.readdirSync(".");
    for (i = 0; i < filenames.length; i ++) {
        stats = fs.statSync("./" + filenames[i]);
        totalBytes += stats.size;
    }
    console.log(totalBytes);
}

calculateByteSize();

2nd example:

var fs = require('fs');

var count = 0,
    totalBytes = 0;

function calculateByteSize() {
    fs.readdir(".", function (err, filenames) {
        var i;
        count = filenames.length;

        for (i = 0; i < filenames.length; i++) {
            fs.stat("./" + filenames[i], function (err, stats) {
                totalBytes += stats.size;
                count--;
                if (count === 0) {
                    console.log(totalBytes);
                }
            });
        }
    });
}

calculateByteSize();
Disunity answered 26/12, 2014 at 15:51 Comment(0)
D
11

Your first example is all blocking I/O. In other words, you would need to wait until the readdir operation is complete before looping through each file. Then you would need to block (wait) for each individual sync stat operation to run before moving on to the next file. No code could run after calculateByteSize() call until all operations are completed.

The async (second) example on the otherhand is all non-blocking using the callback pattern. Here, the execution returns to just after the calculateByteSize() call as soon as fs.readdir is called (but before the callback is run). Once the readdir task is complete it performs a callback to your anonymous function. Here it loops through each of the files and again does non-blocking calls to fs.stat.

The second is more advantageous. If you can pretend that calls to readdir or stat can range from 250ms to 750ms to complete (this is probably not the case), you would be waiting for serial calls to your sync operations. However, the async operations would not cause you to wait between each call. In other words, looping over the readdir files, you would need to wait for each stat operation to complete if you were doing it synchronously. If you were to do it asynchronously, you would not have to wait to call each fs.stat call.

Definiens answered 26/12, 2014 at 16:4 Comment(3)
Thanks! awesome answer. so in other words - in the second example we can get the stat for the 3rd file before the 1st file's stat received, right?Disunity
Where can one read more about what is happening behind the scenes? As explained in your answer.Alveolate
I/O bound work does not require CPU, so blocking a thread waiting on that to complete is inefficient. Javascript uses events/callbacks when the data is ready, allowing the running thread to utilize CPU resources. nodejs.org/en/docs/guides/event-loop-timers-and-nexttickDefiniens
S
6

In your first example, the node.js process, which is single-threaded, is blocking for the entire duration of your readdirSync, and can't do anything else except wait for the result to be returned. In the second example, the process can handle other tasks and the event loop will return it to the continuation of the callback when the result is available. So you can handle a much much higher total throughput by using asynchronous code -- the time spent waiting for the readdir in the first example is probably thousands of times as long as the time actually spend executing your code, so you're wasting 99.9% or more of your CPU time.

Slug answered 26/12, 2014 at 16:2 Comment(0)
C
3

In your example the benefit of async programming is indeed not much visible. But suppose that your program needs to do other things as well. Remember that your JavaScript code is running in a single thread, so when you chose the synchronous implementation the program can't do anything else but waiting for the IO operation to finish. When you use async programming, your program can do other important tasks while the IO operation runs in the background (outside the JavaScript thread)

Celik answered 26/12, 2014 at 16:2 Comment(0)
T
2

Can you please detail the reason why would the second example be better? I'll be happy to get an ever wider explanation that would help me understand the concept..

It's all about concurrency for network servers (thus the name "node"). If this were in a build script the second, synchronous example would be "better" in that is is more straightforward. And given a single disk, there might not be much actual benefit to making it asynchronous.

However, in a network service, the first, synchronous version would block the entire process and defeat node's main design principle. Performance would be slow as number of concurrent clients increased. However the second, asynchronous example, would perform relatively well as while waiting for the relatively-slow filesystem to come back with results, it could handle all the relatively-fast CPU operations concurrently. The async version should basically be able to saturate your filesystem and however much your filesystem can deliver, node will be able to get it out to clients at that rate.

Toxicant answered 26/12, 2014 at 16:5 Comment(0)
A
0

Lots of good answers here, but be sure to also read the docs:

The synchronous versions will block the entire process until they complete--halting all connections.

There is a good overview of sync vs async in the documentation: http://nodejs.org/api/fs.html#fs_file_system

Adhern answered 26/12, 2014 at 16:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.