HtmlUnit close all windows memory leak
Asked Answered
U

3

8

HtmlUnit does not appears to close windows in the webclient and thus creating a memory leak. I am trying to get a page with HtmlUnit and pass it off to JSoup for parsing. I am aware that JSoup can connect to a page but I need to use this approach as I need to hold a logged in session on some sites prior to parsing them.

Here is the code:

import java.io.IOException;
import java.net.MalformedURLException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class HtmlUnitLeakTest {

public static void main(String args[]) throws FailingHttpStatusCodeException, MalformedURLException, IOException{

        WebClient webClient = new WebClient(BrowserVersion.CHROME);
        webClient.getOptions().setPrintContentOnFailingStatusCode(false);
        webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setJavaScriptEnabled(true);
        webClient.getOptions().setCssEnabled(false);

        for(int i = 0; i < 500; i++){
            HtmlPage page = webClient.getPage("http://www.stackoverflow.com");
            Document doc = Jsoup.parse(page.asXml());
            webClient.closeAllWindows();
            System.out.println(i);
            if((i % 5 == 0)){
                System.out.println(i);
            }
        }
    }
}

As this runs the memory continually climbs and in my debug screen I can see all the windows are still referenced under the webclient and not closed.

I have seen this code around that is suppose to close these windows:

List<WebWindow> windows = webclient.getWebWindows();
for (WebWindow ww : windows) {
    ww.getJobManager().removeAllJobs();
    ww.getJobManager().shutdown();
}
webclient.closeAllWindows();

But alas it does not and I continue to have the memory leak.

Anyone experienced this issue?

Cheers

Version info:

HtmlUnit 2.15

java version "1.7.0_51"

Java(TM) SE Runtime Environment (build 1.7.0_51-b13)

Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
Uniformed answered 19/10, 2014 at 13:14 Comment(1)
Hi, currently i'm using HtmlUnit for automate a page. I'm wondering how did you find memory leak. Have you used any tool? If yes, i can also check and verify if any leak is happening. Thanks in advance!Gorham
Q
3

I have a piece of code very similar to yours, and I've been pulling my hair out for the last 2 days trying to solve this. I tried everything they mention on the web and I could not find a solution - to the point where I started messing around with the code and suddenly, the leak stopped. I was using a memory analyzer tool and my program got the point where it was using 2gb of ram (which I set up as java heap in the jvm arguments), and then it crashed after 20 minutes. Now it has been running for 1 hour and the memory usage is stable at 10mb.

What did I do? I've put the webClient initialization inside the for loop:

public class HtmlUnitLeakTest {

   public static void main(String args[]) throws FailingHttpStatusCodeException, MalformedURLException, IOException{

    for(int i = 0; i < 500; i++){
    try{
        WebClient webClient = initializeClient();

        HtmlPage page = webClient.getPage("http://www.stackoverflow.com");
        Document doc = Jsoup.parse(page.asXml());
        webClient.closeAllWindows();
        System.out.println(i);
        if((i % 5 == 0)){
            System.out.println(i);
        }
    }finally {
            webClient.getCurrentWindow().getJobManager().removeAllJobs();
            webClient.close();
            System.gc();
            }
        }
    }

    private static WebClient initilizeCilent(){
    final WebClient webClient = new WebClient(BrowserVersion.CHROME);
    webClient.getOptions().setPrintContentOnFailingStatusCode(false);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setJavaScriptEnabled(true);
    webClient.getOptions().setCssEnabled(false);

    return webClient;
    }
}

I know it is a theoretically wrong approach, but I was desperate to get it working, and now it does! If you already fixed the problem with a better (correct) approach, please I would like to know that too!

Queensland answered 26/10, 2016 at 11:32 Comment(0)
E
0
 /**
 * Returns an immutable list of open web windows (whether they are top level windows or not).
 * This is a snapshot; future changes are not reflected by this list.
 *
 * @return an immutable list of open web windows (whether they are top level windows or not)
 * @see #getWebWindowByName(String)
 * @see #getTopLevelWindows()
 */
public List<WebWindow> getWebWindows() {
    return Collections.unmodifiableList(new ArrayList<>(windows_));
}
Erv answered 16/11, 2015 at 9:6 Comment(0)
A
0

There was a bug in HTMLunit 2.15 that an onunload script caused the JS engine thread to run again after it was closed and then it was left running.

So I suggest upgrading to more recent (now it's 2.27).

Also you might go trough all the windows before closing and remove the onunload handlers.

final List<WebWindow> windows = webClient.getWebWindows();
for (final WebWindow window : windows) {
    ...
}
webClient.closeAllWindows();
Austine answered 9/8, 2017 at 21:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.