Process AJAX request in Htmlunit
Asked Answered
M

2

7

I have a program written to scrape the source code from a webpage after a button is clicked. I am unable to scrape the right page because I believe an AJAX request is being sent, and I am not waiting for this response to take place. My code is currently:

public class Htmlunitscraper { 

  private static String s = "http://cpdocket.cp.cuyahogacounty.us/SheriffSearch/results.aspx?q=searchType%3dSaleDate%26searchString%3d10%2f21%2f2013%26foreclosureType%3d%27NONT%27%2c+%27PAR%27%2c+%27COMM%27%2c+%27TXLN%27";

  public static String scrapeWebsite() throws IOException {

    java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF); 
System.setProperty("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");

    final WebClient webClient = new WebClient();
    final HtmlPage page = webClient.getPage(s);
    final HtmlForm form = page.getForms().get(2);
    final HtmlSubmitInput button = form.getInputByValue(">");
    final HtmlPage page2 = button.click();
    String originalHtml = page2.refresh().getWebResponse().getContentAsString();
    return originalHtml;
  }
}

After referring to this link, I believe to fix this I could implement the method "webClient.waitForBackgroundJavaScript(10000)". The only issue is I do not understand how to do this because each time I click the button I create a HtmlPage object, not a WebClient object. How could I incorporate this method to fix the problem?

Mullen answered 23/10, 2013 at 19:52 Comment(0)
T
7

For me it helped to use htmlunit 2.15 with NicelyResynchronizingAjaxController, and also

webClient.getOptions().setThrowExceptionOnScriptError(false);

My full setup is

    WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
    webClient.getOptions().setJavaScriptEnabled(true);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setCssEnabled(false);
    webClient.setAjaxController(new NicelyResynchronizingAjaxController());
Timmerman answered 9/10, 2014 at 1:11 Comment(0)
R
4

I would try the solution of setting

webClient.setAjaxController(new NicelyResynchronizingAjaxController());

this would cause all ajax calls to be synchronous.

Alternatively, did you try in your solution to call to "webClient.waitForBackgroundJavaScript(10000)" after tou got the page?

Something like this:

final HtmlPage page2 = button.click();
webClient.waitForBackgroundJavaScript(10000)
String originalHtml = page2.asXml();
return originalHtml;

Please use also htmlunit 2.13

Rationalize answered 7/11, 2013 at 10:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.