How do I use and debug WWW::Mechanize?
Asked Answered
D

5

5

I am very new to Perl and i am learning on the fly while i try to automate some projects for work. So far its has been a lot of fun.

I am working on generating a report for a customer. I can get this report from a web page i can access. First i will need to fill a form with my user name, password and choose a server from a drop down list, and log in. Second i need to click a link for the report section. Third a need to fill a form to create the report.

Here is what i wrote so far:

my $mech = WWW::Mechanize->new();
my $url = 'http://X.X.X.X/Console/login/login.aspx';

$mech->get( $url );

$mech->submit_form(
     form_number => 1,
     fields      =>{
        'ctl00$ctl00$cphVeriCentre$cphLogin$txtUser'  => 'someone',
        'ctl00$ctl00$cphVeriCentre$cphLogin$txtPW'    => '12345',
        'ctl00$ctl00$cphVeriCentre$cphLogin$ddlServers'  => 'Live',
     button => 'Sign-In'
   },   
);
die unless ($mech->success);

$mech->dump_forms();

I dont understand why, but, after this i look at the what dump outputs and i see the code for the first login page, while i belive i should have reached the next page after my successful login.

Could there be something with a cookie that can effect me and the login attempt?

Anythings else i am doing wrong?

Appreciate you help, Yaniv

Derosa answered 9/6, 2009 at 7:54 Comment(0)
D
6

This is several months after the fact, but I resolved the same issue based on a similar questions I asked. See Is it possible to automate postback from the client side? for more info.

I used Python's Mechanize instead or Perl, but the same principle applies.

Summarizing my earlier response:

ASP.NET pages need a hidden parameter called __EVENTTARGET in the form, which won't exist when you use mechanize normally.

When visited by a normal user, there is a __doPostBack('foo') function on these pages that gives the relevant value to __EVENTTARGET via a javascript onclick event on each of the links, but since mechanize doesn't use javascript you'll need to set these values yourself.

The python solution is below, but it shouldn't be too tough to adapt it to perl.

def add_event_target(form, target):
    #Creates a new __EVENTTARGET control and adds the value specified
    #.NET doesn't generate this in mechanize for some reason -- suspect maybe is 
    #normally generated by javascript or some useragent thing?
    form.new_control('hidden','__EVENTTARGET',attrs = dict(name='__EVENTTARGET'))
    form.set_all_readonly(False)
    form["__EVENTTARGET"] = target
Disarrange answered 17/8, 2009 at 20:7 Comment(3)
I tried this but for my problem it still doesn't work. Firebug showed that yet another param (__EVENTARGUMENT) is also passed. I added both that and the __EVENTTARGET one but they seem to be ignored -- I always get the same results (I need these for paging -- accessing subsequent pages).Astri
@HD: Did you reach your page by means of a form submission? (For example, a "Search for Widgets" function that gave you a paged list of results) If you got to the page using a form, it might be much tricker since you'll have that long __VIEWSTATE to cope with, too. If you could send a link I might give it a lookDisarrange
@HD: I also found on some sites that you need to purge out some extraneous form values as well. You can see all the field values using select_form, and purge them by whatever means your preferred language uses to delete array elements. (for example .pop() in python) Unfortunately, trial and error is the only way I've found to locate these elements.Disarrange
T
2

You can only mechanize stuff that you know. Before you write any more code, I suggest you use a tool like Firebug and inspect what is happening in your browser when you do this manually.

Of course there might be cookies that are used. Or maybe your forgot a hidden form parameter? Only you can tell.

EDIT:

  • WWW::Mechanize should take care of cookies without any further intervention.
  • You should always check whether the methods you called were successful. Does the first get() work?
  • It might be useful to take a look at the server logs to see what is actually requested and what HTTP status code is sent as a response.
Thereunder answered 9/6, 2009 at 9:38 Comment(7)
Thanx Manni. I have Firebug but i am not sure exactly what to look for. Where do i check for cookies? I looked at all the params and there are no hidden ones.Derosa
Look at the tab labled "Net". It will reveal all HTTP-headers sent by the server, including any cookies.Thereunder
This is what i got from my code: GET X.X.X.X/Console/login/login.aspx Accept-Encoding: gzip, x-gzip, deflate User-Agent: libwww-perl/5.822 (no content) HTTP/1.1 200 OK Cache-Control: private Connection: close Date: Mon, 08 Jun 2009 15:08:32 GMT Server: Microsoft-IIS/6.0 Content-Length: 14720 Content-Type: text/html; charset=utf-8 Client-Date: Mon, 08 Jun 2009 15:08:32 GMT Client-Peer: X.X.X.X:80 Client-Response-Num: 1Derosa
Link: <../server.css>; rel="stylesheet"; type="text/css" Refresh: 6010; URL=../login/login.aspx?logoff=true Set-Cookie: ASP.NET_SessionId=ivz5k045r4en4eehrn3yed55; path=/; HttpOn +ly Title: Console - Login X-AspNet-Version: 2.0.50727 X-Meta-CODE-LANGUAGE: C# X-Meta-GENERATOR: Microsoft Visual Studio .NET 7.1 X-Meta-Vs-DefaultClientScript: JavaScript X-Meta-Vs-TargetSchema: schemas.microsoft.com/intellisense/ie5 X-Powered-By: ASP.NETDerosa
That's not exactly easy to read, but it definitely DOES say somthing about a cookie containing a session id.Thereunder
I now that i input the correct values since i see in the dump: ctl00$ctl00$cphserver$cphLogin$txtUser=someone (text) ctl00$ctl00$cphserver$cphLogin$txtPW= (password) ctl00$ctl00$cphserver$cphLogin$ddlServers= Live (option) [* Live| Test] ctl00$ctl00$cphserver$cphLogin$btnSignIn=Sign-In (submit) ctl00$ctl00$cphserver$cphLogin$btnConfigure=Configure (submit)Derosa
OK, i can see that my user-agnet is not the same, my code sends: User-Agent: libwww-perl/5.822 And i try to use this: $mech->agent_alias( 'Windows Mozilla' ); But it doesnt make a difference in the dump !? why?Derosa
L
2

If you are on Windows, use Fiddler to see what data is being sent when you perform this process manually, and then use Fiddler to compare it to the data captured when performed by your script.

In my experience, a web debugging proxy like Fiddler is more useful than Firebug when inspecting form posts.

Landside answered 10/5, 2012 at 11:47 Comment(0)
F
1

I have found it very helpful to use Wireshark utility when writing web automation with WWW::Mechanize. It will help you in few ways:

  1. Enable you realize whether your HTTP request was successful or not.
  2. See the reason of failure on HTTP level.
  3. Trace the exact data which you pass to the server and see what you receive back.

Just set an HTTP filter for the network traffic and start your Perl script.

Fantasm answered 19/3, 2013 at 3:30 Comment(0)
D
0

The very short gist of aspx pages it that they hold all of the local session information within a couple of variables prefixed by "__" in the general aspxform. Usually this is a top level form and all form elements will be part of it, but I guess that can vary by implementation.

For the particular implementation I was dealing with I needed to worry about 2 of these state variables, specifically:

__VIEWSTATE
__EVENTVALIDATION.

Your goal is to make sure that these variables are submitted into the form you are submitting, since they might be part of that main form aspxform that I mentioned above, and you are probably submitting a different form than that.

When a browser loads up an aspx page a piece of javascript passes this session information along within the asp server/client interaction, but of course we don't have that luxury with perl mechanize, so you will need to manually post these yourself by adding the elements to the current form using mechanize.

In the case that I just solved I basically did this:

my $browser = WWW::Mechanize->new( );

# fetch the login page to get the initial session variables
my $login_page = 'http://www.example.com/login.aspx';
$response = $browser->get( $login_page);

# very short way to find the fields so you can add them to your post
$viewstate = ($browser->find_all_inputs( type => 'hidden', name => '__VIEWSTATE' ))[0]->value;
$validation = ($browser->find_all_inputs( type => 'hidden', name => '__EVENTVALIDATION' ))[0]->value;

# post back the formdata you need along with the session variables
$browser->post( $login_page, [ username => 'user', password => 'password, __VIEWSTATE => $viewstate, __EVENTVALIDATION => $validation ]);

# finally get back the content and make sure it looks right
print $response->content();
Devlin answered 10/5, 2012 at 1:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.