What is the easiest way in pure Perl to stream from another HTTP resource?

Asked 14/10, 2009 at 17:46 Answered 12/10, 2019 at 1:42

What is the easiest way (without opening a shell to curl and reading from stdin) in Perl to stream from another HTTP resource? I'm assuming here that the HTTP resource I'm reading from is a potentially infinite stream (or just really, really long)

Siclari answered 14/10, 2009 at 17:46 Comment(0)

HTTP::Lite's request method allows you to specify a callback.

The $data_callback parameter, if used, is a way to filter the data as it is received or to handle large transfers. It must be a function reference, and will be passed: a reference to the instance of the http request making the callback, a reference to the current block of data about to be added to the body, and the $cbargs parameter (which may be anything). It must return either a reference to the data to add to the body of the document, or undef.

~~However, looking at the source, there seems to be a bug in sub request in that it seems to ignore the passed callback.~~ It seems safer to use set_callback:

#!/usr/bin/perl

use strict;
use warnings;

use HTTP::Lite;

my $http = HTTP::Lite->new;
$http->set_callback(\&process_http_stream);
$http->http11_mode(1);

$http->request('http://www.example.com/');

sub process_http_stream {
    my ($self, $phase, $dataref, $cbargs) = @_;
    warn $phase, "\n";
    return;
}

Output:

C:\Temp> ht
connect
content-length
done-headers
content
content-done
data
done

It looks like a callback passed to the request method is treated differently:

#!/usr/bin/perl

use strict;
use warnings;

use HTTP::Lite;

my $http = HTTP::Lite->new;
$http->http11_mode(1);

my $count = 0;
$http->request('http://www.example.com/',
    \&process_http_stream,
    \$count,
);

sub process_http_stream {
    my ($self, $data, $times) = @_;
    ++$$times;
    print "$$times====\n$$data\n===\n";
}

Kitchenmaid answered 14/10, 2009 at 17:52 Comment(3)

Awesome, that would seem to explain why no matter what I was doing the documents I was returning were 0 bytes. – Siclari 14/10, 2009 at 18:20

Sigh it seems that a callback function given to request is treated differently than one that is set using set_callback and the docs don't explain this correctly. – Stevestevedore 14/10, 2009 at 18:42

LWP has a callback mechanism too. – Kosher 8/4, 2011 at 15:51

Good old LWP allows you to process the result as a stream.

E.g., here's a callback to yourFunc, reading/passing byte_count bytes to each call to yourFunc (you can drop that param if you don't care how large the data is to each call, and just want to process the stream as fast as possible):

use LWP;
...
$browser = LWP::UserAgent->new();
$response = $browser->get($url, 
                          ':content_cb' => \&yourFunc, 
                          ':read_size_hint' => byte_count,);
...
sub yourFunc {
   my($data, $response) = @_;
   # do your magic with $data
   # $respose will be a response object created once/if get() returns
}

Captivity answered 14/10, 2009 at 18:19 Comment(3)

+1, this may have worked, I didn't get a chance to try it as the other answer worked before I had a chance to implement this. – Siclari 14/10, 2009 at 18:35

Hah, I knew it! I just couldn't find it in the docs so I erased my half-assed answer :) – Jefferson 14/10, 2009 at 18:36

@Jefferson I did not remember this either but note that LWP and LWP::Simple are different beasts. – Stevestevedore 14/10, 2009 at 18:50

HTTP::Lite's request method allows you to specify a callback.

The $data_callback parameter, if used, is a way to filter the data as it is received or to handle large transfers. It must be a function reference, and will be passed: a reference to the instance of the http request making the callback, a reference to the current block of data about to be added to the body, and the $cbargs parameter (which may be anything). It must return either a reference to the data to add to the body of the document, or undef.

~~However, looking at the source, there seems to be a bug in sub request in that it seems to ignore the passed callback.~~ It seems safer to use set_callback:

#!/usr/bin/perl

use strict;
use warnings;

use HTTP::Lite;

my $http = HTTP::Lite->new;
$http->set_callback(\&process_http_stream);
$http->http11_mode(1);

$http->request('http://www.example.com/');

sub process_http_stream {
    my ($self, $phase, $dataref, $cbargs) = @_;
    warn $phase, "\n";
    return;
}

Output:

C:\Temp> ht
connect
content-length
done-headers
content
content-done
data
done

It looks like a callback passed to the request method is treated differently:

#!/usr/bin/perl

use strict;
use warnings;

use HTTP::Lite;

my $http = HTTP::Lite->new;
$http->http11_mode(1);

my $count = 0;
$http->request('http://www.example.com/',
    \&process_http_stream,
    \$count,
);

sub process_http_stream {
    my ($self, $data, $times) = @_;
    ++$$times;
    print "$$times====\n$$data\n===\n";
}

Kitchenmaid answered 14/10, 2009 at 17:52 Comment(3)

Awesome, that would seem to explain why no matter what I was doing the documents I was returning were 0 bytes. – Siclari 14/10, 2009 at 18:20

LWP has a callback mechanism too. – Kosher 8/4, 2011 at 15:51

Wait, I don't understand. Why are you ruling out a separate process? This:

open my $stream, "-|", "curl $url" or die;
while(<$stream>) { ... }

sure looks like the "easiest way" to me. It's certainly easier than the other suggestions here...

Minsk answered 14/10, 2009 at 23:6 Comment(4)

I am not sure about this but won't this block until curl has read the complete response? – Stevestevedore 15/10, 2009 at 1:16

No, curl spits output out as it gets it; it doesn't buffer anything in memory. You can verify yourself by grabbing a large file and watching the process size of curl as it loads. – Minsk 15/10, 2009 at 2:15

Prefer not to create the threads, but otherwise, it's a fine solution. – Siclari 15/10, 2009 at 12:52

Unless you have multiple gigabits of bandwidth available on your box, you will always be I/O limited when pulling from network resources. The CPU work involved in spawning a process is unmeasurable noise. I strongly suspect you're prematurely optimizing. – Minsk 15/10, 2009 at 22:21

Event::Lib will give you an easy interface to the fastest asynchronous IO method for your platform.

IO::Lambda is also quite nice for creating fast, responsive, IO applications.

Celebrate answered 14/10, 2009 at 17:54 Comment(1)

I didn't know about that module. It looks great! – Celebrate 15/10, 2009 at 12:1

Here is a version I ended up using via Net::HTTP

This is basically a copy of the example from the Net::HTTP man page / perl doc

use Net::HTTP;

my $s = Net::HTTP->new(Host => "www.example.com") || die $@;
$s->write_request(GET => "/somestreamingdatasource.mp3");
my ($code, $mess, %h) = $s->read_response_headers;
while (1) {
  my $buf;
  my $n = $s->read_entity_body($buf, 4096);
  die "read failed: $!" unless defined $n;
  last unless $n;
  print STDERR "got $n bytes\n";
  print STDOUT $buf;
}

Meiny answered 12/10, 2019 at 1:42 Comment(0)

Recommended topics

Hot tags