Efficient way to read JSON file?
Asked Answered
B

3

10

I have seen different method to read JSON files in Nodejs, like this:

  1. Using FS library

    Sync

     var fs = require('fs');
     var obj = JSON.parse(fs.readFileSync('file', 'utf8'));
    

    Async:

     var fs = require('fs');
     var obj;
     fs.readFile('file', 'utf8', function (err, data) {
       if (err) throw err;
       obj = JSON.parse(data);
     });
    

    Source : https://mcmap.net/q/53620/-using-node-js-how-do-i-read-a-json-file-into-server-memory

  2. Using require()

     let data = require('/path/file.json');
    
  3. Using Ajax request

    How to retrieve data from JSON file using Jquery and ajax?

There might have any other ways. But I heard when reading JSON file using Method 1 is efficient than other methods.

I'm developing a module where I have to read a JSON file when each client side request and I'm currently using Method 1. This is a banking application and performance matters, so help me to find which way is good to use this scenario.

Bat answered 23/1, 2019 at 5:25 Comment(3)
If you don't need fs, and you want it synchronous, using require is most efficient because it saves you memory. If you don't need fs and you want it async, using xmlHttpRequest is most efficient. If you need fs for other things, fs is the most efficient method in either case because once loaded it's faster than require and it has lower time cost than xmlHttpRequestKorwin
@Asthmatic Great & Thanks ! This answer will be helpful lot. :)Bat
If you're dealing with a large JSON, by far the greatest bottleneck will be JSON.parse itself. It requires that you load the whole file in a String (plus, JavaScript uses UTF16 so double the memory usage) and blind JSON parsing is quite slow. If your input is an array or dictionary you can 1) stream the JSON parsing so you can start working before you've loaded the whole file, 2) filter while parsing so you only generate the objects you want.Percy
L
2

Method 3 is out of consideration as it combines one of the other methods with a network request, so you still have to choose one of the other methods.

I assume that Method 2 is leaking memory. NodeJS will return exactly the same thing by reference if you require it twice:

 require("thing") === require("thing")

Therefore if you require something once, it will stay in memory forever. This is fast if you look it up multiple times, but if you got a lot of files, it will fill up memory.

Now only Method 1 is left, and there I would go with the async version, as it can perform multiple requests in parallel, which will outperform the sync method if your server is under load.


I personally would go with option 4:

Store it in a database. Databases load the data into memory for faster access, and they were built to handle a lot of files. As you are dealing with JSON, Mongodb would be a good choice:

 const db = mongodb.collection("json");

 function getFile() {
    return db.findOne({ "name": "test" });
 }
Limburg answered 23/1, 2019 at 6:16 Comment(0)
S
1

I use the following; it's very fast but I want to make it even faster.

It can read multiple files at a time up to the pool's limit.

  1. Add a promise pool class:

    export class PromisePool<T> {
        private maxConcurrent: number;
        private currentConcurrent: number;
        private pending: (() => Promise<any>)[] = [];
    
        constructor(maxConcurrent: number) {
            this.maxConcurrent = maxConcurrent;
            this.currentConcurrent = 0;
        }
        async add(fn: () => PromiseLike<T>): Promise<T> {
            if (this.currentConcurrent >= this.maxConcurrent) {
                await new Promise(resolve => this.pending.push(async () => resolve(void 0)));
            }
            this.currentConcurrent++;
            try {
                return await fn();
            } finally {
                this.currentConcurrent--;
                if (this.pending.length > 0) {
                    this.pending.shift()!();
                }
            }
        }
    }
    
    
  2. Define some pools for file loading. You could have multiple different "reading" file queues and writing file queues, but you may end up going over fd limits without a clear view of how many exact numbers you are allowing:

    const fileLoadingPool = new PromisePool(400);
    const fileWritingPool = new PromisePool(200);
    
  3. Now you can load your data. How do you this is up to you but make sure you add to the pool:

    const loadFiles = async () => {
        await fs.promises.mkdir(someDir, { recursive: true });
        const dirContents = await fs.promises.readdir(someDir);
    
        await Promise.all(dirContents.map(async dirObject => {
            const basePath = `${someDir}${dirObject}`;
            const files = await fs.promises.readdir(basePath);
            return Promise.all(files.map(async file => fileLoadingPool.add(async () => {
                const baseFile = `${basePath}/${file}`;
                if (file.endsWith(".json.disabled") && config.clearDisabledFilesOnStartup) {
                    await fs.promises.rm(baseFile);
                    return;
                }
                if (!file.endsWith(".disabled") && !file.endsWith(".json")) {
                    return;
                }
                const fileData = await fs.promises.readFile(baseFile, "utf-8");
                const fileDataTyped = JSON.parse(fileData, reviver) as SomeType;
                if (file.endsWith(".disabled")) {
                    Cache.disabledFiles.set(fileDataTyped.someStringProp, fileDataTyped);
                    return;
                }
                else if (!file.endsWith(".disabled") && fileDataTyped?.isDisabled) {
                    await fs.promises.rename(baseFile, `${baseFile}.disabled`);
                    logger.log("Disabled file", `${baseFile}`);
                    Cache.disabledFiles.set(fileDataTyped.someStringProp, fileDataTyped);
                    return;
                }
                Cache.enabledFiles.set(fileDataTyped.someStringProp, fileDataTyped);
            })));
        }));
        return Promise.resolve();
    }
    

You can even chain many of these types of functions together like so:

await Promise.all([
    loadFiles(),
    loadSomething(),
]);

And you can have an interval that saves using similar API:

const writeFiles = async () => {
    const _knownFiles = Array.from(Cache.enabledFiles.entries()).filter(value => value[1].modified === true);
    if (_knownFiles.length > 0) {
        await Promise.all(_knownFiles.map(async known => fileWritingPool.add(async () => {
            const [key, fileDataTyped] = known;
            await fs.promises.mkdir(`${someDir}${fileDataTyped.someStringProp}`, { recursive: true });
            fileDataTyped.modified = false;
            await fs.promises.writeFile(`${someDir}${fileDataTyped.someStringProp}/${key}.json.new`, JSON.stringify(fileDataTyped, replacer, 2));
            await fs.promises.rename(`${someDir}${fileDataTyped.someStringProp}/${key}.json.new`, `${someDir}${fileDataTyped.someStringProp}/${key}.json`);
            if (fileDataTyped?.isDisabled) {
                await fs.promises.rename(`${someDir}${fileDataTyped.someStringProp}/${key}.json`, `${someDir}${fileDataTyped.someStringProp}/${key}.json.disabled`);
                logger.log("Disabled file", `${someDir}${fileDataTyped.someStringProp}/${key}.json`);
                Cache.enabledFiles.delete(key);
            }
        })));
    }
}

setInterval(async () => {
    await Promise.all([
        writeFiles(),
        writeSomething(),
    ])
}, 60000);

With this method I can save files without blocking and can read thousands of tiny JSON files worth about 3GB of JSON data in memory (as maps) in about 1 minute.

Scut answered 23/4 at 12:54 Comment(0)
M
0

so I created a big json file and measured time to see which one is faster, the code to create the file is at the end and commented.

const fs = require('fs')

// method 1 - sync
console.time('method_1_sync ')
var obj = JSON.parse(fs.readFileSync('file.json', 'utf8'))
console.log(obj[1000] === 2000)
console.timeEnd('method_1_sync ')

// method 2
console.time('method_2      ')
let data = require('./file.json')
console.log(data[1000] === 2000)
console.timeEnd('method_2      ')

// method 1 - aysnc
console.time('method_1_async')
fs.readFile('file.json', 'utf8', function (err, data) {
  if (err) throw err
  data = JSON.parse(data)
  console.log(data[1000] === 2000)
  console.timeEnd('method_1_async')
})

/*
var obj = {}

for (i=0; i < 1000000; i++){
  obj[i] = i+i
}

var json = JSON.stringify(obj)
fs.writeFile('file.json', json, function() {})
*/

Here's the result on my machine:

method_1_sync : 131.861ms
method_2      : 131.510ms
method_1_async: 130.521ms

method_1_async seems to be the fastest. Method 3 is not worth testing because of network latency.

Marquettamarquette answered 23/1, 2019 at 5:46 Comment(2)
This is a very bad test case without any meaning. you should run the whole thing a few thousand times and measure the average time.Limburg
You should add another test case where you put the JSON file into a DB stored in memory and see what the query times are.Trounce

© 2022 - 2024 — McMap. All rights reserved.