Node.js - High memory usage when using createReadStream and createWriteStream
Asked Answered
R

1

9

I was testing streams with node and I setup a program to read a large file and write it again using streams. The problem is when running the program, memory usage of node goes up to 1.3 GB, which is exactly the size of the file that is being read. It is like it doesn't stream it, it buffers it and writes it in one go OR the garbage collector doesn't destroy the chunk variables in memory. This is the program:

const {  createReadStream, createWriteStream } = require('fs');

const readStream = createReadStream('../movie.mp4', {
    highWaterMark: 10000
});
const writeStream = createWriteStream('./copy.mp4', {
    highWaterMark: 10000
});

readStream.on('data', function (chunk) {
    writeStream.write(chunk);
})

readStream.on('end', function () {
    console.log("reading done");
    writeStream.end();
});

writeStream.on('close', function () {
    console.log("Writing done.");
})

And the weird thing is if I pipe these streams, it works as expected and the memory usage won't go above 20 MB. Like this:

const {  createReadStream, createWriteStream } = require('fs');

const readStream = createReadStream('../movie.mp4', {
    highWaterMark: 10000
});
const writeStream = createWriteStream('./copy.mp4', {
    highWaterMark: 10000
});

readStream.pipe(writeStream);

What could cause such behavior?

Node version: v14.15.4

Repress answered 31/1, 2021 at 21:2 Comment(0)
R
16

Well, I found the problem. There is this condition called Back pressure. In my case it happens because the read stream flow is way faster than the write stream flow. and it happens because the readStream buffers the read data in memory until the writeMemory writes it. So the solution is we pause the readStream temporarily until the writeStream finishes writing and then we feed it more chunks of data. This is the right program:

const {  createReadStream, createWriteStream } = require('fs');

const readStream = createReadStream('../movie.mp4', {
    highWaterMark: 10000
});

const writeStream = createWriteStream('./copy.mp4', {
    highWaterMark: 10000
});


readStream.on('data', function (chunk) {
    // according to docs the value of result variable is: 
    // Returns: <boolean> false if the stream wishes for the calling code to wait for the 'drain' event to be emitted before continuing to write additional data; otherwise true.
    const result = writeStream.write(chunk);

    if(!result) {
        console.log("BACKPRESSURE");
        readStream.pause();
    }
});

writeStream.on('drain', () => {
    console.log("DREAINED");
    readStream.resume();
});

readStream.on('end', function () {
    console.log("reading done");
    writeStream.end();
});

writeStream.on('close', function () {
    console.log("Writing done.");
})

And the docs on drain event is here.

Repress answered 1/2, 2021 at 17:23 Comment(1)
Very nice, I was reading now about the backpressure on the official docs: nodejs.org/es/docs/guides/backpressuring-in-streams. Your problem/answer fits precisely my searches.Hoodoo

© 2022 - 2024 — McMap. All rights reserved.