The site http://openbook.etoro.com/#/main/ has an live feed what is generated by javascript via XHR keep-alive requests and getting answers from server as gzip compressed JSON string.
I want capture the feed into a file.
The usual way (WWW::Mech..) is (probably) not viable because the need of reverese engineering all Javascripts in the page and simulating the browser is really hard task, so, looking for an alternative solution.
My idea is using a Man-in-the-middle tactics, so the broswser will do his work and i want capture the communication via an perl proxy - dedicated only for this task.
I'm able catch the initial communication, but not the feed itself. The proxy working OK, because in the browser the feed is running only my filers not works.
use HTTP::Proxy;
use HTTP::Proxy::HeaderFilter::simple;
use HTTP::Proxy::BodyFilter::simple;
use Data::Dumper;
use strict;
use warnings;
my $proxy = HTTP::Proxy->new(
port => 3128, max_clients => 100, max_keep_alive_requests => 100
);
my $hfilter = HTTP::Proxy::HeaderFilter::simple->new(
sub {
my ( $self, $headers, $message ) = @_;
print STDERR "headers", Dumper($headers);
}
);
my $bfilter = HTTP::Proxy::BodyFilter::simple->new(
filter => sub {
my ( $self, $dataref, $message, $protocol, $buffer ) = @_;
print STDERR "dataref", Dumper($dataref);
}
);
$proxy->push_filter( response => $hfilter); #header dumper
$proxy->push_filter( response => $bfilter); #body dumper
$proxy->start;
Firefox is configured using the above proxy for all communication.
The feed is running in the browser, so the proxy feeding it with data. (When i stop the proxy, the feed is stopping too). Randomly (can't figure when) i getting the following error:
[Tue Jul 10 17:13:58 2012] (42289) ERROR: Getting request failed: Client closed
Can anybody show me a way, how to construt the correct HTTP::Proxy filter for Dumper all communication between the browser and the server regardles of keep_alive XHR?
http
in the combo-box near the top, select the packet that starts a request, menu Analyze → Follow TCP stream to see the text representation of a HTTP request/response pair. – Corabella