Java GUI to display webpages and return HTML
Asked Answered
H

6

9

I need a workflow like below:

// load xyz.com in the browser window
// the browser is live, meaning users can interact with it
browser.load("http://www.google.com");

// return the HTML of the initially loaded page
String page = browser.getHTML();

// after some time
// user might have navigated to a new page, get HTML again
String newpage = browser.getHTML();

I am surprised to see how hard this is to do with Java GUIs such as JavaFX (http://lexandera.com/2009/01/extracting-html-from-a-webview/) and Swing.

Is there some simple way to get this functionality in Java?

Hyperdulia answered 18/11, 2013 at 15:31 Comment(4)
Did you take a look at WebKit embedded in JavaFX runtime?Moneywort
Yes, it is difficult to get the HTML out from JavaFX (lexandera.com/2009/01/extracting-html-from-a-webview).Hyperdulia
@moeb the link you provide is for an Android WebView, not for JavaFX as zenbeni suggests.Heavyarmed
I don't know if this may be useful, but you can check this link out: #14273950Ropy
T
7

Here is a contrived example using JavaFX that prints the html content to System.out - it should not be too complicated to adapt to create a getHtml() method. (I have tested it with JavaFX 8 but it should work with JavaFX 2 too).

The code will print the HTML content everytime a new page is loaded.

Note: I have borrowed the printDocument code from this answer.

public class TestFX extends Application {

    @Override
    public void start(Stage stage) throws Exception {
        try {
            final WebView webView = new WebView();
            final WebEngine webEngine = webView.getEngine();

            Scene scene = new Scene(webView);

            stage.setScene(scene);
            stage.setWidth(1200);
            stage.setHeight(600);
            stage.show();

            webEngine.getLoadWorker().stateProperty().addListener(new ChangeListener<Worker.State>() {
                @Override
                public void changed(ObservableValue<? extends State> ov, State t, State t1) {
                    if (t1 == Worker.State.SUCCEEDED) {
                        try {
                            printDocument(webEngine.getDocument(), System.out);
                        } catch (Exception e) { e.printStackTrace(); }
                    }
                }
            });

            webView.getEngine().load("http://www.google.com");

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public static void printDocument(Document doc, OutputStream out) throws IOException, TransformerException {
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer transformer = tf.newTransformer();
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
        transformer.setOutputProperty(OutputKeys.METHOD, "xml");
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
        transformer.transform(new DOMSource(doc), new StreamResult(new OutputStreamWriter(out, "UTF-8")));
    }

    public static void main(String[] args) {
        launch(args);
    }
}
Tallbott answered 21/11, 2013 at 14:18 Comment(7)
Thanks. One question - what exactly is the execution model for code inside changed()? Does it execute in a separate thread from the thread calling load()?Hyperdulia
No everything in the code above is executed on the JavaFX Thread. Note however that load does not load the page, it only asks the WebEngine to schedule a page loading task - the WebEngine then uses a background thread to actually load the page in order to avoid blocking the UI. Once the loading is done, the WebEngine calls the changed methods on the JavaFX Thread. See javadoc for more details about the threading model.Tallbott
Thanks. I want a sequencing between load and print. Something like following - load a page, wait till print is complete, load another page, wait till print is complete, <repeat>. How can I do that?Hyperdulia
@Hyperdulia you can load the next page within the changed method: printDocument(...); webView.getEngine().load(getNextPageUrl()); - and the getNextPageUrl is a simple method that returns the items of an array and increment the index each time it's called for example. Something like: private String[] pages = ...; private index i; private String getNextPage() { return pages[++i]; }. Don't have time to write a complete example now, sorry.Tallbott
Hi, Thanks for the solution. I tried this. This displays the same html that i loaded. How can i get the final html which gets rendered after executing all the scripts ?Colettacolette
@Colettacolette I suggest you ask a separate question with the details of your problem.Tallbott
@Tallbott I have asked the question here. Can you please support ?#61321438Colettacolette
J
4

Below you will find a SimpleBrowser component which is a Pane containing a WebView.

Source code at gist.

Sample usage:

SimpleBrowser browser = new SimpleBrowser()
          .useFirebug(true);    

// ^ useFirebug(true) option - will enable Firebug Lite which can be helpful for 
// | debugging - i.e. to inspect a DOM tree or to view console messages 

Scene scene = new Scene(browser);

browser.load("http://stackoverflow.com", new Runnable() {
    @Override
    public void run() {
        System.out.println(browser.getHTML());
    }
});

browser.getHTML() is put inside a Runnable because one needs to wait for a web page to download and render. Trying to invoke this method before page loading will return an empty page, so wrapping this into a runnable is a simple way I came up with to wait for a page to load.

import javafx.beans.value.ChangeListener;
import javafx.beans.value.ObservableValue;
import javafx.concurrent.Worker;
import javafx.scene.layout.*;
import javafx.scene.web.WebEngine;
import javafx.scene.web.WebView;

public class SimpleBrowser extends Pane {
    protected final WebView webView = new WebView();
    protected final WebEngine webEngine = webView.getEngine();

    protected boolean useFirebug;

    public WebView getWebView() {
        return webView;
    }

    public WebEngine getEngine() {
        return webView.getEngine();
    }

    public SimpleBrowser load(String location) {
        return load(location, null);
    }

    public SimpleBrowser load(String location, final Runnable onLoad) {
        webEngine.load(location);

        webEngine.getLoadWorker().stateProperty().addListener(new ChangeListener<Worker.State>() {
            @Override
            public void changed(ObservableValue<? extends Worker.State> ov, Worker.State t, Worker.State t1) {
                if (t1 == Worker.State.SUCCEEDED) {
                    if(useFirebug){
                        webEngine.executeScript("if (!document.getElementById('FirebugLite')){E = document['createElement' + 'NS'] && document.documentElement.namespaceURI;E = E ? document['createElement' + 'NS'](E, 'script') : document['createElement']('script');E['setAttribute']('id', 'FirebugLite');E['setAttribute']('src', 'https://getfirebug.com/' + 'firebug-lite.js' + '#startOpened');E['setAttribute']('FirebugLite', '4');(document['getElementsByTagName']('head')[0] || document['getElementsByTagName']('body')[0]).appendChild(E);E = new Image;E['setAttribute']('src', 'https://getfirebug.com/' + '#startOpened');}");
                    }
                    if(onLoad != null){
                        onLoad.run();
                    }
                }
            }
        });

        return this;
    }

    public String getHTML() {
        return (String)webEngine.executeScript("document.getElementsByTagName('html')[0].innerHTML");
    }

    public SimpleBrowser useFirebug(boolean useFirebug) {
        this.useFirebug = useFirebug;
        return this;
    }

    public SimpleBrowser() {
        this(false);
    }

    public SimpleBrowser(boolean useFirebug) {
        this.useFirebug = useFirebug;

        getChildren().add(webView);

        webView.prefWidthProperty().bind(widthProperty());
        webView.prefHeightProperty().bind(heightProperty());
    }
}

Demo Browser:

import javafx.application.Application;
import javafx.event.ActionEvent;
import javafx.event.EventHandler;
import javafx.scene.Scene;
import javafx.scene.control.Button;
import javafx.scene.control.TextField;
import javafx.scene.layout.HBox;
import javafx.scene.layout.Priority;
import javafx.scene.layout.VBox;
import javafx.scene.layout.VBoxBuilder;
import javafx.stage.Stage;

public class FXBrowser {
    public static class TestOnClick extends Application {


        @Override
        public void start(Stage stage) throws Exception {
            try {
                SimpleBrowser browser = new SimpleBrowser()
                    .useFirebug(true);

                final TextField location = new TextField("http://stackoverflow.com");

                Button go = new Button("Go");

                go.setOnAction(new EventHandler<ActionEvent>() {
                    @Override
                    public void handle(ActionEvent arg0) {
                        browser.load(location.getText(), new Runnable() {
                            @Override
                            public void run() {
                                System.out.println("---------------");
                                System.out.println(browser.getHTML());
                            }
                        });
                    }
                });


                HBox toolbar  = new HBox();
                toolbar.getChildren().addAll(location, go);

                toolbar.setFillHeight(true);

                VBox vBox = VBoxBuilder.create().children(toolbar, browser)
                    .fillWidth(true)
                    .build();


                Scene scene = new Scene( vBox);

                stage.setScene(scene);
                stage.setWidth(1024);
                stage.setHeight(768);
                stage.show();

                VBox.setVgrow(browser, Priority.ALWAYS);

                browser.load("http://stackoverflow.com");
            } catch (Exception e) {
                e.printStackTrace();
            }
        }

        public static void main(String[] args) {
            launch(args);
        }
    }
}
Jackquelin answered 22/11, 2013 at 1:37 Comment(0)
P
0

There is not a simple solution. In fact, there might not even be a solution at all short of building your own browser.

The key issue is interaction. If you want to display content only, then JEditorPane and many third party libs make that a more attainable goal. If you really need a user interacting with a webpage then either:

  • Have the user use a normal browser to interact
  • Build a GUI that makes calls to web services/urls to do the interaction, but the display is up to you.

On the returning the HTML side of things, it sounds like you are trying to capture history or refresh the page. In either case, it sounds like you are in the wrong technology. Either modify the original site, or add in some java script in the browser with Greasemonkey or something similar.

Pamphleteer answered 20/11, 2013 at 18:4 Comment(1)
Of course it is doable, Selenium does it. Not only that, Selenium can capture screenshots of the rendered page as you (or it) interact with the page.Borne
I
0

You may want to see to djproject. But possibly you'll find JavaFX usage easier.

Intramural answered 21/11, 2013 at 14:31 Comment(0)
H
0

Depending on stuff I don't know about your project this is either genious or moronic, but you could use a real browser in stead and instrument it with Selenium Webdriver. Only suggesting this as it appears from the other answer that you are going down a difficult path.

There's another question about extracting html with webdriver here. It's about using python, but webdriver has a java api as well.

Horned answered 21/11, 2013 at 17:3 Comment(0)
C
0

I was able to get the executed html. I kept the alert statement after the html is loaded in JavaScript. I used webEngine.setOnAlert method to check if the alert was executed and then printed the html. I got the correct response. Below is the code

HTML

alert("ready");

JavaFx Application

webEngine.setOnAlert(new EventHandler<WebEvent<String>>(){

                        @Override
                        public void handle(WebEvent<String> event) {
                            //labelWebTitle.setText(webEngine.getTitle());
                             if("ready".equals(event.getData())){
                                 //TODO: initialize
                                 System.out.println("HTML Ready");
                                 WebEngine engine = (WebEngine)event.getSource();
                                 String html = (String) engine.executeScript("document.documentElement.outerHTML");
                                 org.jsoup.nodes.Document doc = Jsoup.parse(html);
                                 Element image = doc.getElementById("canvasImage");
                                 System.out.println(image.attr("src"));
                            }
                        }

                    });
Colettacolette answered 28/4, 2020 at 5:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.