How to for loop in casperjs
Asked Answered
S

3

5

I am trying to click a 'next' button N number of times and grab the page source each time. I understand that I can run an arbitrary function on the remote website, so instead of click() I just use the remote function nextPage() How do I run the following, an arbitrary number of times:

var casper = require('casper').create();

casper.start('http://www.example.com', function() {

    this.echo(this.getHTML());
    this.echo('-------------------------');

    var numTimes = 4, count = 2;

    casper.repeat(numTimes, function() {
        this.thenEvaluate(function() {
            nextPage(++count);
        });

        this.then(function() {
            this.echo(this.getHTML());
            this.echo('-------------------------');
        });
    });

});

'i' here is an index I tried to use in a javascript for loop.

So tl;dr: I want lick 'next', print pages source, click 'next', print page source, click 'next'... continue that N number of times.

Sumerian answered 16/9, 2013 at 18:51 Comment(0)
E
7

First, you can pass a value to the remote page context (i.e. to thenEvaluate function like this:

    this.thenEvaluate(function(remoteCount) {
        nextPage(remoteCount);
    }, ++count);

However, Casper#repeat might not be a good function to use here as the loop would NOT wait for each page load and then capture the content.

You may rather devise a event based chaining.

The work-flow of the code would be:

  1. Have a global variable (or at-least a variable accessible to the functions mentioned below) to store the count and the limit.

  2. listen to the load.finished event and grab the HTML here and then call the next page.

A simplified code can be:

var casper = require('casper').create();

var limit = 5, count = 1;

casper.on('load.finished', function (status) {
    if (status !== 'success') {
        this.echo ("Failed to load page.");
    }
    else {
        this.echo(this.getHTML());
        this.echo('-------------------------');
    }



    if(++count > limit) {
        this.echo ("Finished!");

    }
    else {
        this.evaluate(function(remoteCount) {
            nextPage(remoteCount);
            // [Edit the line below was added later]
            console.log(remoteCount);
            return remoteCount;
        }, count);

    }

});

casper.start('http://www.example.com').run();

NOTE: If you pages with high load of JS processes etc. you may also want to add a wait before calling the nextPage :

this.wait( 
   1000, // in ms
   function () {
        this.evaluate(function(remoteCount) {
            nextPage(remoteCount);
        }, count);
   }
);     

[EDIT ADDED] The following event listeners will help you debug.

// help is tracing page's console.log 
casper.on('remote.message', function(msg) { 
    console.log('[Remote Page] ' + msg); 
}); 

// Print out all the error messages from the web page 
casper.on("page.error", function(msg, trace) { 
    casper.echo("[Remote Page Error] " + msg, "ERROR"); 
    casper.echo("[Remote Error trace] " + JSON.stringify(trace, undefined, 4)); 
});
Equuleus answered 17/9, 2013 at 3:58 Comment(4)
Thank you VERY much for your code sudipto. I am ALMOST there, just one strange problem. It works for every page but the second one. It seems as if the nextPage function has a null for the first iteration INSIDE the evaluate function? I'm not sure what's going on. But here's the code: pastebin.com/QJvA2nap and here's teh output: pastebin.com/kKZHiLKMSumerian
ok. First add these 2 event listeners: // help is tracing page's console.log casper.on('remote.message', function(msg) { console.log('[Remote Page] ' + msg); }); // Print out all the error messages from the web page casper.on("page.error", function(msg, trace) { casper.echo("[Remote Page Error] " + msg, "ERROR"); casper.echo("[Remote Error trace] " + JSON.stringify(trace, undefined, 4)); });Equuleus
The above 2 event listeners will listen to remote page errors and console.log called through JS in remote page (you can call this from the evaluate function. Now, in the evaluate function, before you are writing return remoteCount; add this line console.log(remoteCount);. This will show the value received directly from the page. In case this also does the same, we need to dig deeper.Equuleus
So it appears that the errors are the same from console in a browser, but it looks like the page I'm trying to access does a refresh to generate a unique identifier for the session (or day, or user, or browser, not sure) and appends it to the url. This reload/refresh causes your code to be unable to call nextPage(), because, I assume, it doesn't exist on the intital load, not sure. Either way, but iterating from 0 instead of 1, I can get all the data, I need. THANK YOU VERY MUCH FOR YOUR HELP!Sumerian
B
4

You could try using Casper#repeat

This should do, for the most part, what you want:

var numTimes = 10, count = 1;

casper.repeat(numTimes, function() {
    this.thenEvaluate(function(count) {
        nextPage(count);
    }, ++count);

    this.then(function() {
        this.echo(this.getHTML());
        this.echo('-------------------------');
    });
});
Besnard answered 16/9, 2013 at 19:29 Comment(3)
Thank you VERY much for your help. I've tried to adjust your page to my script, but I still cant get it to go to the different pages, it outputs the same page each time. It appears that nextPage(++count) does not fire. However nextPage(5) does fire. It appears that I can't pass variables to the thenEvaluate function, ive been trying for an hour to find out how. Maybe this is my lack of javascript knowledge, but no combination seems to work for me.Sumerian
@hedix: the variable count used inside evaluate should be present in remote page scope or passed as param through evaluate function.Equuleus
@Equuleus Yea, it looks like I forgot to pass it through as an argument. Thanks for pointing that out.Besnard
W
1
var global_page_links = [];

casper.then(function(){
    for(var i=1; i<=5; i++){    
        // you just add all your links to array, and use it in casper.each()
        global_page_links.push(YOUR_LINK);
    }

    this.each(global_page_links, function(self, link) {
        if (link){
            self.thenOpen(link, function() {
                console.log("OPENED: "+this.getCurrentUrl());
                // do here what you need, evaluate() etc.
            });
        }
    });
});

This is answer to question, how to use for() in casperjs to launch several links

Widen answered 1/9, 2016 at 14:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.