Node.JS Unbounded Concurrency / Stream backpressure over TCP
Asked Answered
K

1

13

As I understand it, one of the consequences of Node's evented IO model is the inability to tell a Node process that is (for example) receiving data over a TCP socket, to block, once you've hooked up your receiving event handlers (or otherwise started listening for data).

If the receiver can't process the incoming data fast enough, "unbounded concurrency" can result, whereby node under-the-hood continues to read data off the socket as fast as it can, scheduling new data events on the event loop instead of block on the socket, until the process eventually runs out of memory and dies.

The receiver can't tell node to slow its reading, which would otherwise allow TCP's inbuilt flow control mechanisms to kick in and indicate to the sender that it needs to slow down.

Firstly, is what I've described so far accurate? Is there something I've missed that allows node to avoid this situation?

One of the much touted features of Node Streams is the automatic handling of backpressure.

AFAIK, the only way a writable stream (of a tcp socket) can tell if it needs to slow down or not is by looking at socket.bufferSize (indicating the amount of data written to the socket but not yet sent). Given that Node at the receiving end always reads as fast as it can, this can only indicate a slow network connection between sender and receiver, and NOT whether the receiver can't keep up.

So secondly, can Node Streams automatic backpressure somehow work in this situation to deal with a receiver that can't keep up?

It also seems that this problem affects browsers receiving data via websockets, for the similar reason that the websockets API doesn't provide a mechanism to tell the browser to slow its reading from the socket.

Is the only solution to this problem for Node (and browsers using websockets) to implement a manual flow control mechanism at the application level, to explicitly tell the sending process to slow down?

Keeter answered 11/8, 2014 at 6:26 Comment(3)
I don't have the time to write a full-fledged answer, but check out the streams documentation and pay attention to the highWaterMark option. This is how the backpressure is handled: node will slow its reading when it fills its incoming buffer; and will also stop sending data if the outgoing buffer is full. Node reads as fast as it can… until it fills up its buffer. It won't read any more data until the buffer is empty again. I'd suggest studying the Stream Buffer Length of this example.Costanzo
Also, "the process eventually runs out of memory and dies": this won't happen because when you implement a transform stream for instance, you must signify that you finished handling a chunk by calling the callback. As long as the callback isn't called, node will not read anymore data (provided that the highWaterMark is reached).Costanzo
@PaulMougel Unless someone implemented their stream subclass wrong ; ) . Your comments seem like enough for an answer to meIntervalometer
M
8

To answer your first question, I believe your understanding is not accurate -- at least not when piping data between streams. In fact, if you read the documentation for the pipe() function you'll see that it explicitly says that it automatically manages the flow so that "destination is not overwhelmed by a fast readable stream."

The underlying implementation of pipe() is taking care of all of the heavy lifting for you. The input stream (a Readable stream) will continue to emit data events until the output stream (a Writable stream) is full. As an aside, if I remember correctly, the stream will return false when you attempt to write data that it cannot currently process. At this point, the pipe will pause() the Readable stream, which will prevent it from emitting further data events. Thus, the event loop isn't going to fill up and exhaust your memory nor is it going to emit events that are simply lost. Instead, the Readable will stay paused until the Writable stream emits a drain event. At that point, the pipe will resume() the Readable stream.

The secret sauce is piping one stream into another, which is managing the back pressure for you automatically. This hopefully answers your second question, which is that Node can and does automatically manage this by simply piping streams.

And finally, there is really no need to implement this manually (unless you are writing a new stream from scratch) since it is already provided for you. :)

Handling all of this is not easy, as admitted on the Node blog post that announced the streams2 API in Node. It's a great resource and certainly provides much more information than I could here. One little gotcha that isn't entirely obvious that you should know however, from the docs here and for backwards compatibility reasons:

If you attach a data event listener, then it will switch the stream into flowing mode, and data will be passed to your handler as soon as it is available.

So just be aware that attaching the data event listener in an attempt to observe something in the stream will fundamentally alter the stream to the old way of doing things. Ask me how I know.

Monzonite answered 29/6, 2015 at 22:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.