fs.readFileSync seems faster than fs.readFile - is it OK to use for a web app in production?
Asked Answered
T

5

51

I know that when developing in node, you should always try to avoid blocking (sync) functions and go with async functions, however, I did a little test to see how they compare.

I need to open a JSON file that contains i18n data (like date and time formats, etc) and pass that data to a class that uses this data to format numbers, etc in my view.

It would be kind of awkward to start wrapping all the class's methods inside callbacks, so if possible, I would use the synchronous version instead.

console.time('one');
console.time('two');
fs.readFile( this.dir + "/" + locale + ".json", function (err, data) {
  if (err) cb( err );
  console.timeEnd('one');
});
var data = fs.readFileSync( this.dir + "/" + locale + ".json" );
console.timeEnd('two');

This results in the following lines in my console:

two: 1ms
one: 159ms

It seems that fs.readFileSync is about 150 times faster than fs.readFile - it takes about 1 ms to load a 50KB JSON file (minified). All my JSON files are around 50-100KB.

I was also thinking maybe somehow memoizing or saving this JSON data to the session so that the file is read-only once per session (or when the user changes their locale). I'm not entirely sure how to do that, it's just an idea.

Is it okay to use fs.readFileSync in my case or will I get in trouble later?

Twombly answered 11/12, 2012 at 14:42 Comment(2)
+1 for letting me know that console.time and console.timeEnd exists! :)Soleure
Funny how this comment has more upvotes than the post itself ;)Trammel
I
84

No, it is not OK to use a blocking API call in a node server as you describe. Your site's responsiveness to many concurrent connections will take a huge hit. It's also just blatantly violating the #1 principle of node.

The key to node working is that while it is waiting on IO, it is doing CPU/memory processing at the same time. This requires asynchronous calls exclusively. So if you have 100 clients reading 100 JSON files, node can ask the OS to read those 100 files but while waiting for the OS to return the file data when it is available, node can be processing other aspects of those 100 network requests. If you have a single synchronous call, ALL of your client processing stops entirely while that operation completes. So client number 100's connection waits with no processing whatsoever while you read files for clients 1, 2, 3, 4, and so on sequentially. This is Failville.

Here's another analogy. If you went to a restaurant and were the only customer, you would probably get faster service if a single person sat you, took your order, cooked it, served it to you, and handled the bill without the coordination overhead of dealing with the host/hostess, server, head chef, line cooks, cashiers, etc. However, with 100 customers in the restaurant, the extra coordination means things happen in parallel and the overall responsiveness of the restaurant is increased way beyond what it would be if a single person were trying to handle 100 customers on their own.

Illjudged answered 11/12, 2012 at 15:42 Comment(9)
Thanks, I actually understand the general idea of node and I/O and blocking vs non-blocking, etc... I just wasn't sure if it would become an issue in this case, as the files are always relatively small. However, I noticed that twitter_cldr simply requires it's locale files - I guess I could also do something like that?Twombly
It doesn't matter the size of the files. Doing synchronous I/O BEFORE you start serving network requests (as your app is initializing, as would be the case ith twitter_cldr) is fine. But once you start serving network requests, it is verboten.Illjudged
This is probably a bit too much discussion in the comments, but twitter_cldr does not necessarily require files while initializing app - for example, if one request requires locale en_US locale, and another one requires en_GB, then even if you have a default loaded at initialization, you would need to load (require) the new locale while serving the request. And if I have thousands of users with different locale settings... I pretty much need to load the locale file on each request - correct?Twombly
That sounds like that will cause a hiccup in the app's responsiveness the first time a given locale is loaded. require only does sync I/O the first time though, so re-requires of the same module will return immediately. But IMHO sounds like a design flaw in twitter_cldr, but I'm not familiar with it's code or purpose. Load modules, which are code, with require at app startup. Load data with async APIs as needed. That's how to work in harmony with node. Using require to load data synchronously while serving requests seems like a mistake to me.Illjudged
OP never mentioned that the script was intended for use on a server. It's fine to block your IO if you don't mind your IO being blocked (e.g. a local script or a start-up sequence).Confiscate
It's mentioned in the question title. "for a web app in production".Illjudged
I down-voted since it does not explain that the test is actually flawed, this isn't a trade-off for async; the test simply shows that the sync function kills the performance of the async function. Other than that the information is still valid.Holna
"but while waiting for the OS to return the file data when it is available". But isn't this blocking the thread anyway? Is the OS spawing a new thread for reading the file?Hylozoism
No, waiting for IO does not block the node.js event loop from processing. And no neither node nor the OS uses a thread-per-file allocation. It's a pool of threads that does not grow in proportion to the IO activity.Illjudged
C
12

You are blocking the callback of the asynchronous read with your synchronous read, remember single thread. Now I understand that the time difference is still amazing, but you should try with a file that is much, much longer to read and imagine that many, many clients will do the same, only then the overhead will pay off. That should answer your question, yes you will run into trouble if you are serving thousands of requests with blocking IO.

Countershaft answered 11/12, 2012 at 15:18 Comment(3)
Thanks, this also answers the question, but I've chosen Peter's answer because it is more thorough.Twombly
You did well, I am not good in anticipating the needs behind a question, often I see them too formally, but trying, thx for your feedback.Countershaft
This is the correct explanation to the problem. Your test is flawed. If you ran the 2 tests independent of each other, I'm pretty sure you'll see the opposite (or near opposite) results.Involuntary
D
1

After a lot of time and a lot of learn & practice I've tried once more and I've found the answer and I can show some example:

const fs = require('fs');

const syncTest = () => {
    let startTime = +new Date();
    const results = [];
    const files = [];

    for (let i=0, len=4; i<len; i++) {
        files.push(fs.readFileSync(`file-${i}.txt`));
    };

    for (let i=0, len=360; i<len; i++) results.push(Math.sin(i), Math.cos(i));
    console.log(`Sync version: ${+new Date() - startTime}`);
};

const asyncTest = () => {
    let startTime = +new Date();
    const results = [];
    const files = [];

    for (let i=0, len=4; i<len; i++) {
        fs.readFile(`file-${i}.txt`, file => files.push(file));
    };

    for (let i=0, len=360; i<len; i++) results.push(Math.sin(i), Math.cos(i));

    console.log(`Async version: ${+new Date() - startTime}`);
};

syncTest();
asyncTest();
Darbie answered 6/10, 2018 at 16:54 Comment(0)
R
1

Yes, it's correct, to deal with the asynchronous way in a server-side environment. But if their use case is different like to generating the build as in client-side JS project, meanwhile reading and writing the JSON files for different flavors.

It doesn't affect that much. Although we needed a rapid manner to create a minified build for deployment (here synchronous comes into the picture). for more info and library

Rollicking answered 1/8, 2019 at 8:3 Comment(0)
D
-1

I've tried to check the real, measurable difference in a speed between fs.readFileSync() and fs.readFile() for downloading 3 different files which are on SD card and I've added between this downloads some math calculation and I don't understand where is the difference in speed which is always showed on node pictures when node is faster also in simple operation like downloading 3 times the same file and the time for this operation is close to time which is needed for downloading 1 time this file.

I understand that this is no doubtly useful that server during downloading some file is able to doing other job but a lot of time on youtube or in books there are some diagrams which are not precise because when you have a situation like below async node is slower then sync in reading small files(like below: 85kB, 170kB, 255kB).

var fs = require('fs');

var startMeasureTime = () => {
  var start = new Date().getTime();
  return start;
};

// synch version
console.log('Start');
var start = startMeasureTime();

for (var i = 1; i<=3; i++) {
  var fileName = `Lorem-${i}.txt`;
  var fileContents = fs.readFileSync(fileName);
  console.log(`File ${1} was downloaded(${fileContents.length/1000}KB) after ${new Date().getTime() - start}ms from start.`);

  if (i === 1) {
    var hardMath = 3*54/25*35/46*255/34/9*54/25*35/46*255/34/9*54/25*35/46*255/34/9*54/25*35/46*255/34/9*54/25*35/46*255/34/9;  
  };
};

// asynch version
setImmediate(() => {
  console.log('Start');
  var start = startMeasureTime();

  for (var i = 1; i<=3; i++) {
    var fileName = `Lorem-${i}.txt`;
    fs.readFile(fileName, {encoding: 'utf8'}, (err, fileContents) => {
      console.log(`File ${1} was downloaded(${fileContents.length/1000}KB) after ${new Date().getTime() - start}ms from start.`);
    });

    if (i === 1) {
      var hardMath = 3*54/25*35/46*255/34/9*54/25*35/46*255/34/9*54/25*35/46*255/34/9*54/25*35/46*255/34/9*54/25*35/46*255/34/9;  
    };
  };
});

This is from console:
Start
File 1 was downloaded(255.024KB) after 2ms from start.
File 1 was downloaded(170.016KB) after 5ms from start.
File 1 was downloaded(85.008KB) after 6ms from start.
Start
File 1 was downloaded(255.024KB) after 10ms from start.
File 1 was downloaded(85.008KB) after 11ms from start.
File 1 was downloaded(170.016KB) after 12ms from start.
Darbie answered 26/10, 2017 at 14:23 Comment(2)
what is the hardMath about ?Utley
this is some math task which could be operated by CPU during waiting for data from files. I've updated the above script by using loops and also starting measure time inside the callback from setImmediate() so now times are little shorter.Darbie

© 2022 - 2024 — McMap. All rights reserved.