Transparently Handling GZip Encoded content with WWW::Mechanize
Asked Answered
R

3

5

I am using WWW::Mechanize and currently handling HTTP responses with the 'Content-Encoding: gzip' header in my code by first checking the response headers and then using IO::Uncompress::Gunzip to get the uncompressed content.

However I would like to do this transparently so that WWW::Mechanize methods like form(), links() etc work on and parse the uncompressed content. Since WWW::Mechanize is a sub-class of LWP::UserAgent, I would prefer to use the LWP::UA::handlers to do this.

While I have been partly successful (I can print the uncompressed content for example), I am unable to do this transparently in a way that I can call

$mech->forms();

In summary: How do I "replace" the content inside the $mech object so that from that point onwards, all WWW::Mechanize methods work as if the Content-Encoding never happened?

I would appreciate your attention and help. Thanks

Roar answered 17/5, 2009 at 9:50 Comment(0)
P
8

WWW::Mechanize::GZip, I think.

Pentheam answered 17/5, 2009 at 11:0 Comment(1)
Thanks! Wonder how I missed it - I did search CPAN :)Roar
C
3

It looks to me like you can replace it by using the $res->content( $bytes ) member.

By the way, I found this stuff by looking at the source of LWP::UserAgent, then HTTP::Response, then HTTP::Message.

Consecration answered 17/5, 2009 at 10:51 Comment(3)
Yes - it works. Thanks. Will use it when I want to do more than gunzip content. For now I'll just use the module suggested by FaylandRoar
Be careful, WWW::Mechanize::GZip looks being quite buggy (see #6874576). Sorry I do not fully understand the replace method you're speaking about: can you give some example code, please?Illumine
@jettero: Did you mean "$res->decoded_content()"? In any case, I voted your answer up because I didn't even think to check for that. So I found it when I searched for "Encoding" in perldoc HTTP::Response. Thanks!Incarcerate
L
0

It is built in with UserAgent and thus Mechanize. One MAJOR caveat to save you some hair

-To debug, make sure you check for error $@ after the call to decoded_content.

$html = $r->decoded_content;
die $@ if $@;

Better yet, look through the source of HTTP::Message and make sure all the support packages are there

In my case, decoded_content returned undef while content is raw binary, and I went on a wild goose chase. UserAgent will set the error flag on failure to decode, but Mechanize will just ignore it (It doesn't check or log the incidence as its own error/warning).

In my case $@ sez: "Can't find IO/HTML.pm .. It was eval'ed

After having to dive into the source, I find out the built-in decoding process is long, meticulous, and arduous, covering just about every scenario and making tons of guesses (Thank you Gisle!).

if you are paranoid, explicitly set the default header to be used with every request at new()

    $browser = new WWW::Mechanize('default_headers' => HTTP::Headers->new('Accept-Encoding' 
                            => scalar HTTP::Message::decodable()));
Loud answered 5/9, 2013 at 9:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.