What are the pros and cons of fs.createReadStream vs fs.readFile in node.js?
Asked Answered
C

4

79

I'm mucking about with node.js and have discovered two ways of reading a file and sending it down the wire, once I've established that it exists and have sent the proper MIME type with writeHead:

// read the entire file into memory and then spit it out

fs.readFile(filename, function(err, data){
  if (err) throw err;
  response.write(data, 'utf8');
  response.end();
});

// read and pass the file as a stream of chunks

fs.createReadStream(filename, {
  'flags': 'r',
  'encoding': 'binary',
  'mode': 0666,
  'bufferSize': 4 * 1024
}).addListener( "data", function(chunk) {
  response.write(chunk, 'binary');
}).addListener( "close",function() {
  response.end();
});

Am I correct in assuming that fs.createReadStream might provide a better user experience if the file in question was something large, like a video? It feels like it might be less block-ish; is this true? Are there other pros, cons, caveats, or gotchas I need to know?

Cunnilingus answered 4/1, 2011 at 0:46 Comment(2)
I know this is old Q, but one of the use-cases for readFile is sorting the data, which you cant do on stream processing, eg: A file with list of numbers which your program needs to sort...there are ways in which you can sort it on stream but the receiver also has to be able to interpret those.Aldin
When I asked this question this was exactly what I was doing. :)Cunnilingus
A
62

A better approach, if you are just going to hook up "data" to "write()" and "close" to "end()":

// 0.3.x style
fs.createReadStream(filename, {
  'bufferSize': 4 * 1024
}).pipe(response)

// 0.2.x style
sys.pump(fs.createReadStream(filename, {
  'bufferSize': 4 * 1024
}), response)

The read.pipe(write) or sys.pump(read, write) approach has the benefit of also adding flow control. So, if the write stream cannot accept data as quickly, it'll tell the read stream to back off, so as to minimize the amount of data getting buffered in memory.

The flags:"r" and mode:0666 are implied by the fact that it is a FileReadStream. The binary encoding is deprecated -- if an encoding is not specified, it'll just work with the raw data buffers.

Also, you could add some other goodies that will make your file serving a whole lot slicker:

  1. Sniff for req.headers.range and see if it matches a string like /bytes=([0-9]+)-([0-9]+)/. If so, you want to just stream from that start to end location. (Missing number means 0 or "the end".)
  2. Hash the inode and creation time from the stat() call into an ETag header. If you get a request header with "if-none-match" matching that header, send back a 304 Not Modified.
  3. Check the if-modified-since header against the mtime date on the stat object. 304 if it wasn't modified since the date provided.

Also, in general, if you can, send a Content-Length header. (You're stat-ing the file, so you should have this.)

Abarca answered 4/1, 2011 at 7:7 Comment(3)
@isaacs, could you please provide an example on how those 3 steps could be implemented, thank you!Leek
The bufferSize option has been deprecated in favor of highWaterMark.Adalia
How does this even answer the original question asked?Consciousness
A
47

fs.readFile will load the entire file into memory as you pointed out, while as fs.createReadStream will read the file in chunks of the size you specify.

The client will also start receiving data faster using fs.createReadStream as it is sent out in chunks as it is being read, while as fs.readFile will read the entire file out and only then start sending it to the client. This might be negligible, but can make a difference if the file is very big and the disks are slow.

Think about this though, if you run these two functions on a 100MB file, the first one will use 100MB memory to load up the file while as the latter would only use at most 4KB.

Edit: I really don't see any reason why you'd use fs.readFile especially since you said you will be opening large files.

Adnate answered 4/1, 2011 at 4:28 Comment(1)
That means that with fs.readFile we cant catch the progress per example?Robertoroberts
K
6

If it's a big file then "readFile" would hog the memory as it buffer all the file content in the memory and may hang your system. While ReadStream read in chunks.

Run this code and observe the memory usage in performance tab of task manager.

 var fs = require('fs');

const file = fs.createWriteStream('./big_file');


for(let i=0; i<= 1000000000; i++) {
  file.write('Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n');
}

file.end();


//..............
fs.readFile('./big_file', (err, data) => {
  if (err) throw err;
  console.log("done !!");
});

Infact,you won't see "done !!" message. "readFile" wouldn't be able to read the file content as buffer is not big enough to hold the file content.

Now instead of "readFile", use readStream and monitor memory usage.

Note : code is taken from Samer buna Node course on Pluralsight

Kloster answered 18/4, 2017 at 8:42 Comment(0)
M
0

Another, perhaps not so well known thing, is that I believe that Node is better at cleaning up non-used memory after using fs.readFile compared to fs.createReadStream. You should test this to verify what works best. Also, I know that by every new version of Node, this has gotten better (i.e. the garbage collector has become smarter with these types of situations).

Muricate answered 30/8, 2012 at 18:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.