In a Cloudflare worker why the faster stream waits the slower one when using the tee() operator to fetch to R2?

I want to fetch an asset into R2 and at the same time return the response to the client. So simultaneously streaming into R2 and to the client too.

Related code fragment:

const originResponse = await fetch(request);

const originResponseBody = originResponse.body!!.tee()

ctx.waitUntil(
    env.BUCKET.put(objectName, originResponseBody[0], {
        httpMetadata: originResponse.headers
    })
)

return new Response(originResponseBody[1], originResponse);

I tested the download of an 1GB large asset with a slower, and a faster internet connection.

In theory the outcome (success or not) of putting to R2 should be the same in both cases. Because its independent of the client's internet connection speed.

However, when I tested both scenarios, the R2 write was successful with the fast connection, and failed with the slower connection. That means that the ctx.waitUntil 30 second timeout was exceeded in case of the slower connection. It was always an R2 put "failure" when the client download took more than 30 sec.

It seems like the R2 put (the reading of that stream) is backpressured to the speed of the slower consumer, namely the client download.

Is this because otherwise the worker would have to enqueue the already read parts from the faster consumer?

Am I missing something? Could someone confirm this or clarify this? Also, could you recommend a working solution for this use-case of downloading larger files?

EDIT:

The Cloudflare worker implementation of the tee operation is clarified here: https://community.cloudflare.com/t/why-the-faster-stream-waits-the-slower-one-when-using-the-tee-operator-to-fetch-to-r2/467416

It explains the experiences. However, a stable solution for the problem is still missing.

Cloudflare Workers limits the flow of a tee to the slower stream because otherwise it would have to buffer data in memory.

For example, say you have a 1GB file, the client connection can accept 1MB/s while R2 can accept 100MB/s. After 10 seconds, the client will have only received 10MB. If we allowed the faster stream to go as fast as it could, then it would have accepted all 1GB. However, that leaves 990MB of data which has already been received from the origin and needs to be sent to the client. That data would have to be stored in memory. But, a Worker has a memory limit of 128MB. So, your Worker would be terminated for exceeding its memory limit. That wouldn't be great either!

With that said, you are running into a bug in the Workers Runtime, which we noticed recently: waitUntil()'s 30-second timeout is intended to start after the response has finished. However, in your case, the 30-second timeout is inadvertently starting when the response starts, i.e. right after headers are sent. This is an unintended side effect of an optimization I made: when Workers detects that you are simply passing through a response body unmodified, it delegates pumping the stream to a different system so that the Worker itself doesn't need to remain in memory. However, this inadvertently means that the waitUntil() timeout kicks in earlier than expected.

This is something we intend to fix. As a temporary work-around, you could write your worker to use streaming APIs such that it reads each chunk from the tee branch and then writes it to the client connection in JavaScript. This will trick the runtime into thinking that you are not simply passing the bytes through, but trying to perform some modifications on them in JavaScript. This forces it to consider your worker "in-use" until the entire stream completes, and the 30-second waitUntil() timeout will only begin at that point. (Unfortunately this work-around is somewhat inefficient in terms of CPU usage since JavaScript is constantly being invoked.)

Recommended topics

Hot tags