What is the most efficient way to read only the first line of a file in Node JS?
Asked Answered
T

6

28

Imagine you have many long text files, and you need to only extract data from the first line of each one (without reading any further content). What is the best way in Node JS to do it?

Thanks!

Tonicity answered 26/2, 2015 at 16:32 Comment(7)
can you call the head command on the files from within node? use the features of the file/operating system for what they're good for. :)Reorganize
The code I need it for is part of a library that could be used on any operating system supporting Node JS, so it'd be better to do it using Node itself. If I'm not mistaken head is not available on Windows, for example.Tonicity
you have to know how long that first line is, but you can overshoot and load the first, say 5kb, and then split that 5kb by lines, keeping only the first. use fs.read() instead of fs.readFile() : nodejs.org/api/…Mintun
@dandavis, that's sort of what I was thinking about, not sure if there are better methods though. @Reorganize mentioned the head command, by ant chance do you know how it is internally implemented?Tonicity
head is an http term use to grab just the file header. it has nothing to do with the content of the first line of a file. you can read the file byte by byte, stopping at the first line break, but 5kb chunks are typically way faster in my experience.Mintun
@dandavis, actually we're talking about Unix head, have a look here.Tonicity
the other head. i always forget about that, but i do like me some tail ;) i suppose you could pipe in the child process if you polyfilled windows. still, i don't think the task of "reading the first line" really needs shell scripts or NPM packages. it's not that hard to bounce around and find it...Mintun
T
14

I ended up adopting this solution, which seems the most performant I've seen so far:

var fs = require('fs');
var Q = require('q');

function readFirstLine (path) {
  return Q.promise(function (resolve, reject) {
    var rs = fs.createReadStream(path, {encoding: 'utf8'});
    var acc = '';
    var pos = 0;
    var index;
    rs
      .on('data', function (chunk) {
        index = chunk.indexOf('\n');
        acc += chunk;
        index !== -1 ? rs.close() : pos += chunk.length;
      })
      .on('close', function () {
        resolve(acc.slice(0, pos + index));
      })
      .on('error', function (err) {
        reject(err);
      })
  });
}

I created a npm module for convenience, named "firstline".

Thanks to @dandavis for the suggestion to use String.prototype.slice()!

Tonicity answered 26/2, 2015 at 18:5 Comment(6)
interesting, how big is each chunk?Mintun
thanks for getting back. sounds like i need to retweak my own buffer size and change my std advice... i would offer that for very top speed (if that's important), it would be faster to use acc.slice(0,acc.indexOf("\n")) on the close event instead of splitting the whole 64kb+, or even to somehow pass the index from the data event (plus the length of acc) for bare-metal efficiency.Mintun
Thanks @dandavis! What you suggest makes sense, I'll edit the answer to include your proposal ;)Tonicity
A heads up I ran into an issue where the next data event would occur before rs.close(). This changed the value of index before the close event, so it caused a bug where you may not get the full first line, I had to wrap the on data function with an index check to make sure it was still undefined.Strength
@ClarenceLiu could you please file an issue with all the details on GitHub? Thanks! github.com/pensierinmusica/firstlineTonicity
In ts and node 18 I get Error: Cannot find module 'q'.Nope
T
20

There's a built-in module almost for this case - readline. It avoids messing with chunks and so forth. The code would look like the following:

const fs = require('fs');
const readline = require('readline');

async function getFirstLine(pathToFile) {
  const readable = fs.createReadStream(pathToFile);
  const reader = readline.createInterface({ input: readable });
  const line = await new Promise((resolve) => {
    reader.on('line', (line) => {
      reader.close();
      resolve(line);
    });
  });
  readable.close();
  return line;
}
Torture answered 12/2, 2020 at 16:57 Comment(5)
This won't work with zero length file. The promise will wait for resolve call forever.Cryo
In addition to what Sake said, you close the input file before even a single line could be read... Either that or you are extremely lucky and benefit from a bogus race condition.Exeunt
@RomainVincent I tested Victor's function above and it seems to give me the first line every time (tested ~ 10 times). Could you elaborate on the race condition, and what the code should be to avoid the race condition?Ratiocinate
@Ratiocinate It's my own bad, there is no race condition with the current example. I must have missed the await keyword, which is unexpected. It would be simpler to just return the promise and call readable.close() immediately after reader.close(). This would avoid bringing the await logic which is quite heavy when transpiled. But this is nitpicking at this point. The main concern remains that if there is no line to read, this will hang forever.Exeunt
Seems to work fine with a zero length file, but may be due to an update. I'm running Node v16.2.0.Rudder
T
14

I ended up adopting this solution, which seems the most performant I've seen so far:

var fs = require('fs');
var Q = require('q');

function readFirstLine (path) {
  return Q.promise(function (resolve, reject) {
    var rs = fs.createReadStream(path, {encoding: 'utf8'});
    var acc = '';
    var pos = 0;
    var index;
    rs
      .on('data', function (chunk) {
        index = chunk.indexOf('\n');
        acc += chunk;
        index !== -1 ? rs.close() : pos += chunk.length;
      })
      .on('close', function () {
        resolve(acc.slice(0, pos + index));
      })
      .on('error', function (err) {
        reject(err);
      })
  });
}

I created a npm module for convenience, named "firstline".

Thanks to @dandavis for the suggestion to use String.prototype.slice()!

Tonicity answered 26/2, 2015 at 18:5 Comment(6)
interesting, how big is each chunk?Mintun
thanks for getting back. sounds like i need to retweak my own buffer size and change my std advice... i would offer that for very top speed (if that's important), it would be faster to use acc.slice(0,acc.indexOf("\n")) on the close event instead of splitting the whole 64kb+, or even to somehow pass the index from the data event (plus the length of acc) for bare-metal efficiency.Mintun
Thanks @dandavis! What you suggest makes sense, I'll edit the answer to include your proposal ;)Tonicity
A heads up I ran into an issue where the next data event would occur before rs.close(). This changed the value of index before the close event, so it caused a bug where you may not get the full first line, I had to wrap the on data function with an index check to make sure it was still undefined.Strength
@ClarenceLiu could you please file an issue with all the details on GitHub? Thanks! github.com/pensierinmusica/firstlineTonicity
In ts and node 18 I get Error: Cannot find module 'q'.Nope
T
7

I know this doesn't exactly answer the question but for those who are looking for a READABLE and simple way to do so:

const fs = require('fs').promises;

async function getFirstLine(filePath) {
    const fileContent = await fs.readFile(filePath, 'utf-8');
    return (fileContent.match(/(^.*)/) || [])[1] || '';
} 

NOTE:

  • naturaly, this will only work with text files, which I assumed you used from your description
  • this will work with empty files and will return an empty string
  • this regexp is very performant since it is simple (no OR conditions`or complex matches) and only reads the first line
Toughminded answered 20/6, 2021 at 11:3 Comment(6)
He asked "the most efficient way" and "without reading any further content"Kattegat
@Kattegat I thought it could be useful to those coming and searching for a simple solution. I came here looking for something like this at first but couldn't find it. I also will write a more explicit disclaimer :)Toughminded
I get TypeError [ERR_INVALID_ARG_TYPE]: The "cb" argument must be of type function. Received type string ('utf-8')Nope
@Nope it's probably the third argument for fs.readFile, it's a callback, but you won't need it if you are awaiting itToughminded
I forgot to add promises to the fs object. Now I get TypeError: fileContent.match is not a function, I use a ts file.Nope
@Nope Put some console.log to see fileContent variable XD I bet you forgot the await and you are trying to .match() a promise. This could be a side conversation, first try to learn how to debug your code :)Toughminded
B
4

Node.js >= 16

In all current versions of Node.js, readline.createInterface can be used as an async iterable, to read a file line by line - or just for the first line. This is also safe to use with empty files.

Unfortunately, the error handling logic is broken in versions of Node.js before 16, where certain file system errors may go uncaught even if the code is wrapped in a try-catch block because of the way asynchronous errors are propagated in streams. So I recommend using this method only in Node.js >= 16.

import { createReadStream } from "fs";
import { createInterface } from "readline";

async function readFirstLine(path) {
    const inputStream = createReadStream(path);
    try {
        for await (const line of createInterface(inputStream)) return line;
        return ''; // If the file is empty.
    }
    finally {
        inputStream.destroy(); // Destroy file stream.
    }
}

const firstLine = await readFirstLine("path/to/file");
Be answered 16/5, 2022 at 21:5 Comment(4)
What is your answer different to this further up, can you elaborate please?Nope
@Nope The only problem I see in the other solution is that it doesn't work with empty files, because it fails to handle the end of stream event. I'll try to come up with an update when I have the time.Be
this will result in Promise{pending}. Can you show how to then catch the Promise to show the file content?Nope
@Nope I've added a line to the end of the snippet to show usage.Be
G
3

Please try this:

https://github.com/yinrong/node-line-stream-util#get-head-lines

It unpipe the upstream once got the head lines.

Gigantic answered 4/1, 2016 at 2:30 Comment(0)
N
2

//Here you go;

var lineReader = require('line-reader');
var async = require('async');

exports.readManyFiles = function(files) {
    async.map(files, 
        function(file, callback))
            lineReader.open(file, function(reader) {
              if (reader.hasNextLine()) {
                reader.nextLine(function(line) {
                  callback(null,line);
                });
              }
            });
        },
        function(err, allLines) {
            //do whatever you want to with the lines
        })
}
Nolde answered 26/2, 2015 at 16:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.