Is there a way to trigger scroll event with HtmlUnit or is it not possible at all?
Asked Answered
N

1

14

I am currently learning HtmlUnit in order to scrape websites. Everything went well and smooth until I encountered a dynamic page (as an example, I am using Pinterest website) on which elements are added on the fly when the user scrolls down.

I have tried several ways that should trigger the scrolling in a real browser (I will show them below). Before going further, I would like to mention that I have the following configuration already set:

    webclient.setJavaScriptEnabled(true);
    webclient.setAjaxController(new NicelyResynchronizingAjaxController());

Lets' say I want to get all my followers on Pinterest. I navigate until that page and now since the first time you only have 24 of them, I want to scroll down, event that triggers the Ajax call to the server and retrieves the next set of followers.

1) Plain javascript or jQuery code to trigger scroll on window.

    ScriptResult sr = followersPage.executeJavaScript("window.scrollBy(0,1000)");
    // One version in jQuery
    // ScriptResult sr = followersPage.executeJavaScript("$(window).scrollTop(0,1000);");
    // also tried with the body, html, with animation
    // ScriptResult sr = followersPage.executeJavaScript("$("html, body").animate({ scrollTop: $(document).height() }, 1000);");
    webclient.waitForBackgroundJavaScript(10000);
    followersPage = (HtmlPage)sr.getNewPage();

=> When I check the distance to top, it is equal to 0 and the result page is the same as the original page. While debugging in Eclipse, when I step over the line where javascript is executed, it directly goes to the next line without any delay. If I write any other javascript, like for example:

     ScriptResult sr = followersPage.executeJavaScript("$(div.GridItems).html('new content')");

you can notice that the debugger hangs half a second on that line, meaning that the javascript is executed.

2) Change the focus from one follower anchor to the other (I chose the anchor because it is used in the focus order when you click on TAB key):

    HtmlDivision gridItems = followersPage.getFirstByXPath("//div[contains(concat(' ',@class,' '),' GridItems ')]");
    List<HtmlDivision> els = (List<HtmlDivision>) gridItems.getByXPath("//div[@class='item ']");
    List<HtmlDivision> items = (List<HtmlDivision>) gridItems.getByXPath("//div[@class='item ']");
    for (HtmlDivision item : items) {
        HtmlAnchor a = item.getFirstByXPath("//a[@class='userWrapper']");
        a.focus();
        webClient.waitForBackgroundJavaScript(1000);
    }
    followersPage = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();

Again, no scrolling occurred. The result page remains the same as the original

3) Create a button which trigger the scroll event on the windows:

    HtmlButton scrollButton = (HtmlButton) followersPage.createElement("button");
    scrollButton.setAttribute("type", "button");
    scrollButton.setAttribute("onclick", "window.scrollTo(0,document.body.scrollHeight);");
    gridItems.appendChild(scrollButton);
    followersPage = scrollButton.click();

Unfortunately, it did not work.

I tried many other methods but no positive result until now.

I read a lot of related articles, also here on stackoverflow, concerning that topic. And it seems that nobody managed to make scroll work using HtmlUnit since most of the questions remained unanswered. That is why I am wondering if that feature was ever functional.

Did someone manage to scroll a page (simple page, no ajax)? Did someone manage to scroll a page, event which trigger s some ajax call?

Noted answered 6/5, 2014 at 10:5 Comment(1)
I also have the same issue, did you find any solution ?Enough
C
0

i would suggest u use casperjs instead of htmlunit in this situation, i tried use htmlunit to open pinterest and got

runtimeError: message=[Property 0 not found.] sourceName=[https://s.pinimg.com/webapp/js/vendor-react-d20f99c48b5d58e4821c.js] line=[1] lineSource=[null] lineOffset=[0]

so it does look htmlunit don't have a good support for js, even the latest version 2.31..

here is a demo code using casperjs:

var utils = require('utils')
var fs = require('fs')
var system = require('system')

var casper = require('casper').create({
    verbose: true,
    logLevel: 'debug',
    localToRemoteUrlAccessEnabled: true,
    webSecurityEnabled: false,
    plainTextAllContent: false,
    viewportSize: {
        width: 1440,
        height: 800
    },
    onError: function(casper, msg, backtrace) {
        utils.dump(backtrace)
    }
});

var cookie = fs.read('cookie.txt').trim() 

casper.on('started', function() {

    this.page.onError = function(msg, trace) {
        casper.echo('Error => ' + msg + '\nError trace => ')
        utils.dump(trace)
    }

    this.page.customHeaders = {
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "zh-CN,en;q=0.5",
        "Accept-Encoding": "gzip, deflate",
        "Connection": "keep-alive",
        "Pragma": "no-cache",
        "Cookie": cookie
    }


});

casper.start('https://www.pinterest.com', function() {

    this.then(function() {
        this.waitForSelector('div[class="_wx _2h"]', function() {
            this.echo("waitForSelector 'div[class=_wx _2h]' is done")
            this.scrollTo(0, 1000);
            this.wait(5000, function() {
                this.scrollTo(0, 2000);
            })
        })

    })

});

save above code to a file say named demo.js, then use following command to start casperjs

casperjs --engine=slimerjs demo.js

then u will see the firefox browser startup visually and working!

Chirurgeon answered 21/6, 2018 at 2:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.