How to create HtmlUnit HTMLPage object from String?
Asked Answered
F

4

20

This question was asked once already, but the API changed I guess and the answers are no valid anymore.

URL url = new URL("http://www.example.com");
StringWebResponse response = new StringWebResponse("<html><head><title>Test</title></head><body></body></html>", url);
HtmlPage page = HTMLParser.parseHtml(response, new TopLevelWindow("top", new WebClient()));
System.out.println(page.getTitleText());

Can't be done because TopLevelWindow is protected and stuff like extending/implementing the window because of that is ridiculous :)

Anybody has an idea how to do that ? It seems to me weird that it can't be done easily.

Flotow answered 26/5, 2011 at 9:30 Comment(0)
V
25

This code works in GroovyConsole

@Grapes(
    @Grab(group='net.sourceforge.htmlunit', module='htmlunit', version='2.8')
)

import com.gargoylesoftware.htmlunit.*
import com.gargoylesoftware.htmlunit.html.*

URL url = new URL("http://www.example.com");
StringWebResponse response = new StringWebResponse("<html><head><title>Test</title></head><body></body></html>", url);
WebClient client = new WebClient()
HtmlPage page = HTMLParser.parseHtml(response, client.getCurrentWindow());
System.out.println(page.getTitleText());
Vanburen answered 26/5, 2011 at 10:51 Comment(3)
I don't know what you are looking at, but the constructor is protected. Classes must be "public" in java, unless it is inner or nested class...Flotow
My bad, I was looking at the class declaration and subtleness of Groovy/Java made my code work in GroovyConsole. I've edited accordingly with a simple twist. That should work for you nowVanburen
In Groovy, due to the dynamic nature of the language, protected and private methods are seen public... That's a known bug, but I easily forget about it some times ;-)Vanburen
Y
3

Using HTMLUnit 2.40, Grooveek's code won't compile, you get "Cannot make a static reference to the non-static method parseHtml(WebResponse, WebWindow) from the type HTMLParser". But there is now a class HtmlUnitNekoHtmlParser implementing the HTMLParser interface, so the following code works:

StringWebResponse response = new StringWebResponse(
    "<html><head><title>Test</title></head><body></body></html>", 
    new URL("http://www.example.com"));
HtmlPage page = new HtmlUnitNekoHtmlParser().parseHtml(
    response, new WebClient().getCurrentWindow());
Yardmaster answered 6/7, 2020 at 19:40 Comment(1)
.parseHtml(...) is no longer available since HtmlUnit 2.43.0Galumph
L
2

There is some sample code in the FAQ https://htmlunit.sourceforge.io/faq.html#HowToParseHtmlString

e.g.

final String htmlCode = "<html>"
        + "  <head>"
        + "    <title>Title</title>"
        + "  </head>"
        + "  <body>"
        + "    content..."
        + "  </body>"
        + "</html> ";
try (WebClient webClient = new WebClient(browserVersion)) {
    final HtmlPage page = webClient.loadHtmlCodeIntoCurrentWindow(htmlCode);
    // work with the html page
}
Levan answered 17/8, 2022 at 9:43 Comment(0)
C
0
String htmlAsString = "<body />";
StringWebResponse response = new StringWebResponse(htmlAsString, new URL("your url"));
HtmlPage htmlPage = new HtmlPage(response, webClient.getCurrentWindow());
Crave answered 16/8, 2022 at 21:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.