Struggling to understand highWaterMark on Readable stream
Asked Answered
T

1

6

I am seeing some behavior that I don't understand with busboy and node streams highWaterMark property. I would expect that the size of the chunks are at maximum the highWaterMark value, but the chunk size looks unaffected by the highWaterMark setting.

I've set the fileHwm: 5 option in busboy, and I have my express route set up like so

app.post('/upload', function(req, res, next) {
  req.pipe(req.busboy); // Pipe it through busboy
  req.busboy.on('file', (fieldname, file, filename) => {
      console.log(`Upload of '${filename}' started`);
      console.log(file);
      file.on('readable', () => {
        let chunk;
        while (null !== (chunk = file.read())) {
          console.log(`Received ${chunk.length} bytes of data.`);
        }
      });

  });
});

when I log file, it looks ok, and I see the highWaterMark property has been set nicely

FileStream {
  _readableState:
   ReadableState {
     objectMode: false,
     highWaterMark: 5,
...

but the size of the chunks that I'm getting out are not 5 like I would expect - instead I'm seeing

Received 65395 bytes of data.
Received 65536 bytes of data.
Received 65536 bytes of data.
Received 65536 bytes of data.

So what gives? I would have expected read to only return 5 bytes at a time. Not that 65kb is a bad size, it's fine. I just wish I understood what was happening, and if it was possible to restrict the buffer size.

Trackman answered 30/5, 2019 at 2:44 Comment(0)
I
3

Both readable and writable streams maintain internal queues, which they use for similar purposes. In the case of a readable stream, the internal queue contains chunks that have been enqueued by the underlying source, but not yet read by the consumer. In the case of a writable stream, the internal queue contains chunks which have been written to the stream by the producer, but not yet processed and acknowledged by the underlying sink.

A queuing strategy is an object that determines how a stream should signal backpressure based on the state of its internal queue. The queuing strategy assigns a size to each chunk, and compares the total size of all chunks in the queue to a specified number, known as the high water mark. The resulting difference, high water mark minus total size, is used to determine the desired size to fill the stream’s queue.

For readable streams, an underlying source can use this desired size as a backpressure signal, slowing down chunk generation so as to try to keep the desired size above or at zero. For writable streams, a producer can behave similarly, avoiding writes that would cause the desired size to go negative.

Concretely, a queuing strategy for web developer–created streams is given by any JavaScript object with a highWaterMark property. For byte streams the highWaterMark always has units of bytes. For other streams the default unit is chunks, but a size() function can be included in the strategy object which returns the size for a given chunk. This permits the highWaterMark to be specified in arbitrary floating-point units.

A simple example of a queuing strategy would be one that assigns a size of one to each chunk, and has a high water mark of three. This would mean that up to three chunks could be enqueued in a readable stream, or three chunks written to a writable stream, before the streams are considered to be applying backpressure. In JavaScript, such a strategy could be written manually as { highWaterMark: 3, size() { return 1; }}, or using the built-in CountQueuingStrategy class, as new CountQueuingStrategy({ highWaterMark: 3 }).

REF: https://streams.spec.whatwg.org/#high-water-mark

Irony answered 26/7, 2023 at 11:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.