Sending an unbuffered response in Plack

Asked 28/7, 2015 at 0:24 Answered 3/1, 2017 at 20:41

I'm working in a section of a Perl module that creates a large CSV response. The server runs on Plack, on which I'm far from expert.

Currently I'm using something like this to send the response:

$res->content_type('text/csv');
my $body = '';
query_data (
    parameters  => \%query_parameters,
    callback    => sub {
        my $row_object = shift;
        $body .= $row_object->to_csv;
    },
);
$res->body($body);
return $res->finalize;

However, that query_data function is not a fast one and retrieves a lot of records. In there, I'm just concatenating each row into $body and, after all rows are processed, sending the whole response.

I don't like this for two obvious reasons: First, it takes a lot of RAM until $body is destroyed. Second, the user sees no response activity until that method has finished working and actually sends the response with $res->body($body).

I tried to find an answer to this in the documentation without finding what I need.

I also tried calling $res->body($row_object->to_csv) on my callback section, but seems like that ends up sending only the last call I made to $res->body, overriding all previous ones.

Is there a way to send a Plack response that flushes the content on each row, so the user starts receiving content in real time as the data is gathered and without having to accumulate all data into a veriable first?

Thanks in advance for any comments!

Dairen answered 28/7, 2015 at 0:24 Comment(4)

I did not try this, but you should be able to use an object with a getline method. For more detail, post a short but complete example. – Complect 28/7, 2015 at 1:30

Thank you Sinan. Yes, what you mention is correct, I tried a simple object implementing getline and it worked fine, except for the fact that Plack is still buffering the response and doesn't send anything to the browser until ->getline is undefined. Regarding my example, I modified it to make it a little bit more self-explanatory. The only real difference by posting my real code would be a lot of non relevant lines added to the mix. The only thing I'm trying to figure out is how to make Plack send an unbuffered/autoflushed response. – Dairen 28/7, 2015 at 4:22

There is no get_data function in your code. I think you mean query_data. – Anthracite 28/7, 2015 at 7:26

Posting a short but complete example enables you to reduce the work I have to do to try out possible solutions. – Complect 28/7, 2015 at 11:0

You can't use Plack::Response because that class is intended for representing a complete response, and you'll never have a complete response in memory at one time. What you're trying to do is called streaming, and PSGI supports it even if Plack::Response doesn't.

Here's how you might go about implementing it (adapted from your sample code):

my $env = shift;

if (!$env->{'psgi.streaming'}) {
    # do something else...
}

# Immediately start the response and stream the content.
return sub {
    my $responder = shift;
    my $writer = $responder->([200, ['Content-Type' => 'text/csv']]);

    query_data(
        parameters  => \%query_parameters,
        callback    => sub {
            my $row_object = shift;
            $writer->write($row_object->to_csv);
            # TODO: Need to call $writer->close() when there is no more data.
        },
    );
};

Some interesting things about this code:

Instead of returning a Plack::Response object, you can return a sub. This subroutine will be called some time later to get the actual response. PSGI supports this to allow for so-called "delayed" responses.
The subroutine we return gets an argument that is a coderef (in this case, $responder) that should be called and passed the real response. If the real response does not include the "body" (i.e. what is normally the 3rd element of the arrayref), then $responder will return an object that we can write the body to. PSGI supports this to allow for streaming responses.
The $writer object has two methods, write and close which both do exactly as their names suggest. Don't forget to call the close method to complete the response; the above code doesn't show this because how it should be called is dependent on how query_data and your other code works.
Most servers support streaming like this. You can check $env->{'psgi.streaming'} to be sure that yours does.

Strawworm answered 3/1, 2017 at 20:41 Comment(1)

Amazing! Thanks for this. :-) – Dairen 3/1, 2017 at 20:47

-1

Plack is middleware. Are you using a web application framework on top of it, like Mojolicious or Dancer2, or something like Apache or Starman server below it? That would affect how the buffering works.

The link above shows an example by Plack's author: https://metacpan.org/source/MIYAGAWA/Plack-1.0037/eg/dot-psgi/echo-stream-sync.psgi

Or you can do it easily by using Dancer2 on top of Plack and Starman or Apache: https://metacpan.org/pod/distribution/Dancer2/lib/Dancer2/Manual.pod#Delayed-responses-Async-Streaming

Recommended topics

Hot tags