How do I do random access reads from (large) files using node.js?
Asked Answered
H

4

5

Am I missing something or does node.js's standard file I/O module lack analogs of the usual file random access methods?

  • seek() / fseek()
  • tell() / ftell()

How does one read random fixed-size records from large files in node without these?

Harrie answered 18/12, 2012 at 12:25 Comment(0)
A
9

tell is not, but it is pretty rare to not already know the position you are at in a file, or to not have a way to keep track yourself.

seek is exposed indirectly via the position argument of fs.read and fs.write. When given, the argument will seek to that location before performing its operation, and if null, it will use whatever previous position it had.

Archway answered 24/12, 2012 at 4:54 Comment(5)
The times I use tell are mostly when I'm reading text in platform-agnostic code line-by-line which could have lines ending in either \n, \r (no longer common), or \r\n. It's certainly still possible to track position without tell though.Harrie
For the benefit of anyone else who might read this thread, I missed the position parameter because in my reading of the docs I managed to conflate it with the offset parameter, which is the offset from the start of the buffer you want to read data into.Harrie
This works great for reading records from binary files, but I also need to read lines of text from passed file offsets. It turns out to be very difficult to read a single line of text using fs.read due to the nature of Buffer and the need to convert such to encoded text such as UTF-8. The obvious ways result in broken characters since UTF-8 includes multibytes characters. ReadableStream handles multibyte characters, but does not allow one to seek freely. I'm not sure there's any library which lets you combine random access and line reading.Harrie
@Harrie The two are kind of at odds with each other unfortunately. This isn't really a common issue. That said, please create a new question if you want answers to this. Reading lines from random positions is independent of your original question here.Archway
Thanks @Archway I will ask a specific question. I just wanted to get my thoughts linked in here too before my internet credit last night ran out. Though my use case (indexing offsets into very large text files - Wikipedia XML dumps) is not "common" I have implemented it before in C, Perl, and even JavaScript in a FireFox plugin so I was surprised it was so tricky in node considering how much better node is that those environments in various ways.Harrie
V
2

node doesn't have these built in, the closest you can get is to use fs.createReadStream with a start parameter to start reading from an offset, (pass in an existing fd to avoid re-opening the file).

http://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options

Vaccinia answered 18/12, 2012 at 12:35 Comment(5)
Unless this turns out to be slow it seems like a perfect solution. It could depend on the cost of creating a ReadStream from an fd.Harrie
Do I also need to pass in the same path parameter each time I call createReadStream() again? The path is mandatory while the fd is just optional and the docs are not clear.Harrie
hm... have you tried passing in null for path, even though it says it's required? otherwise i'd just pass in the same path.Vaccinia
Yes I've since tried it and the "required" path is ignorned no matter what it is if you also supply the optional fd. This doesn't seem to be guaranteed explicitly in the documentation or stated so in the source. Also I can find nothing authoritative by searching the forums. I am loathe to rely on a behaviour that's not documented even if backed up by common sense. I might find a way to ask officially. Thank you yiding.Harrie
I ran into problems when trying to use this multiple times for each time I want to read from some position in the same file: createReadStream multiple times on the same fdHarrie
S
1

I suppose that createReadStream creates new file descriptor over and over. I prefer sync version:

function FileBuffer(path) {
const fd = fs.openSync(path, 'r');

function slice(start, end) {
    const chunkSize = end - start;
    const buffer = new Buffer(chunkSize);

    fs.readSync(fd, buffer, 0, chunkSize, start);

    return buffer;
}

function close() {
    fs.close(fd);
}

return {
    slice,
    close
}

}

Satellite answered 18/10, 2016 at 12:41 Comment(0)
K
0

Use this:

fs.open(path, flags[, mode], callback)

Then this:

fs.read(fd, buffer, offset, length, position, callback)

Read this for details:

https://nodejs.org/api/fs.html#fs_fs_read_fd_buffer_offset_length_position_callback

Karlee answered 24/8, 2017 at 8:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.