how to get html content from a webview?
Asked Answered
D

13

145

Which is the simplest method to get html code from a webview? I have tried several methods from stackoverflow and google, but can't find an exact method. Please mention an exact way.

public class htmldecoder extends Activity implements OnClickListener,TextWatcher
{
TextView txturl;
Button btgo;
WebView wvbrowser;
TextView txtcode;
ImageButton btcode;
LinearLayout llayout;
int flagbtcode;
public void onCreate(Bundle savedInstanceState)
{
            super.onCreate(savedInstanceState);
                setContentView(R.layout.htmldecoder);

    txturl=(TextView)findViewById(R.id.txturl);

    btgo=(Button)findViewById(R.id.btgo);
    btgo.setOnClickListener(this);

    wvbrowser=(WebView)findViewById(R.id.wvbrowser);
    wvbrowser.setWebViewClient(new HelloWebViewClient());
    wvbrowser.getSettings().setJavaScriptEnabled(true);
    wvbrowser.getSettings().setPluginsEnabled(true);
    wvbrowser.getSettings().setJavaScriptCanOpenWindowsAutomatically(true);
    wvbrowser.addJavascriptInterface(new MyJavaScriptInterface(),"HTMLOUT");
    //wvbrowser.loadUrl("http://www.google.com");
    wvbrowser.loadUrl("javascript:window.HTMLOUT.showHTML('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>');");


    txtcode=(TextView)findViewById(R.id.txtcode);
    txtcode.addTextChangedListener(this);

    btcode=(ImageButton)findViewById(R.id.btcode);
    btcode.setOnClickListener(this);

    }

public void onClick(View v)
{
    if(btgo==v)
    {
        String url=txturl.getText().toString();
        if(!txturl.getText().toString().contains("http://"))
        {
            url="http://"+url;
        }
        wvbrowser.loadUrl(url);
        //wvbrowser.loadData("<html><head></head><body><div style='width:100px;height:100px;border:1px red solid;'></div></body></html>","text/html","utf-8");
    }
    else if(btcode==v)
    {
        ViewGroup.LayoutParams params1=wvbrowser.getLayoutParams();
        ViewGroup.LayoutParams params2=txtcode.getLayoutParams();
        if(flagbtcode==1)
        {
            params1.height=200;
            params2.height=220;
            flagbtcode=0;
            //txtcode.setText(wvbrowser.getContentDescription());
        }
        else
        {
            params1.height=420;
            params2.height=0;
            flagbtcode=1;
        }
        wvbrowser.setLayoutParams(params1);
        txtcode.setLayoutParams(params2);

    }
}

public class HelloWebViewClient extends WebViewClient {
    @Override
    public boolean shouldOverrideUrlLoading(WebView view, String url) {

        view.loadUrl(url);
        return true;
    }
    /*@Override
    public void onPageFinished(WebView view, String url)
    {
        // This call inject JavaScript into the page which just finished loading. 
        wvbrowser.loadUrl("javascript:window.HTMLOUT.processHTML('<head>'+document.getElementsByTagName('html')[0].innerHTML+'</head>');");
    }*/

}
class MyJavaScriptInterface
{
    @SuppressWarnings("unused")
    public void showHTML(String html)
    {

        txtcode.setText(html);
    }
}

public void afterTextChanged(Editable s) {
    // TODO Auto-generated method stub

}

public void beforeTextChanged(CharSequence s, int start, int count,
        int after) {
    // TODO Auto-generated method stub

}

public void onTextChanged(CharSequence s, int start, int before, int count) {
    wvbrowser.loadData("<html><div"+txtcode.getText().toString()+"></div></html>","text/html","utf-8");

}

}
Deferment answered 20/11, 2011 at 10:36 Comment(0)
G
116

Actually this question has many answers. Here are 2 of them :

  • This first is almost the same as yours, I guess we got it from the same tutorial.

public class TestActivity extends Activity {

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.webview);
        final WebView webview = (WebView) findViewById(R.id.browser);
        webview.getSettings().setJavaScriptEnabled(true);
        webview.addJavascriptInterface(new MyJavaScriptInterface(this), "HtmlViewer");

        webview.setWebViewClient(new WebViewClient() {
            @Override
            public void onPageFinished(WebView view, String url) {
                webview.loadUrl("javascript:window.HtmlViewer.showHTML" +
                        "('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>');");
            }
        });

        webview.loadUrl("http://android-in-action.com/index.php?post/" +
                "Common-errors-and-bugs-and-how-to-solve-avoid-them");
    }

    class MyJavaScriptInterface {

        private Context ctx;

        MyJavaScriptInterface(Context ctx) {
            this.ctx = ctx;
        }

        public void showHTML(String html) {
            new AlertDialog.Builder(ctx).setTitle("HTML").setMessage(html)
                    .setPositiveButton(android.R.string.ok, null).setCancelable(false).create().show();
        }

    }
}

This way your grab the html through javascript. Not the prettiest way but when you have your javascript interface, you can add other methods to tinker it.


  • An other way is using an HttpClient like there.

The option you choose also depends, I think, on what you intend to do with the retrieved html...

Guthry answered 20/11, 2011 at 11:41 Comment(14)
when execute this line webview.loadUrl("javascript:window.HtmlViewer.showHTML" + "('<head>'+document.getElementsByTagName('html')[0].innerHTML+'</head>');"); the program act like function finish(),and stop that activity.why?how to solve it?Deferment
any idea about this?alternative methods?Deferment
webview.addJavascriptInterface Only works on Jelly Beans and lower version.Least
Two important changes to the above code for Jellybean and later: 1. Remove "window." from the webview.loadUrl line - the javascript interface is attached differently when targeting Jellybean. 2. Put @JavascriptInterface before "public void showHTML" - this is necessary since it's a security risk not to only allow certain methods to be called.Irrespective
And if I would like to use this in a widget? Somebody can help me? #20827204Expressage
Still doesn't work for me (5.1.1).. When I'm adding MyJavaScriptInterface (with @Irrespective hints) when I'm clicking something on loaded page system asks me to choose browser. When I remove this, it won't ask me again.Lithography
this works for me, the web view was hiding the html, now solve it, thanksDobb
I used this on Android 6.0. Furthermore I added @JavascriptInterface in front of showHTML as already mentioned. Works like a charm!Pants
Is there any way to get this HTML formated properly? (not in one line)Ringster
This solution is not deterministic. On webpages with own js code executing javascript wigh loadUrl might not work. On andorid 4.4.2 loadUrl with javascript is not working even if web page is pure html. My final solution was load web page with OkHttp and then pass it to WebView.loadDataWithBaseURL(...). Setting valid url WebView will download all dependencies linked in html.Butters
Here I enabled the remote debugging, it showed Uncaught ReferenceError: HtmlViewer is not defined, no matter with or without @JavascriptInterfaceSclaff
Don't forget to add INTERNET permission (I did)!Doriadorian
why not use: document.documentElement instead of document.getElementsByTagName('html')[0]Martyrology
how to access <html inside <frame? In Google Chrome for PC I can inspect all content, even html from iframes but how to do for Android with WebView and JS? In onPageFinished I started a task with which reruns each 3 seconds to call innerHTML again, I tried document.getElementsByTagName('html')[1].innerHTML but it's undefined, only works for 0 indexFurey
F
91

In KitKat and above, you could use evaluateJavascript method on webview

wvbrowser.evaluateJavascript(
        "(function() { return ('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>'); })();",
         new ValueCallback<String>() {
            @Override
            public void onReceiveValue(String html) {
                Log.d("HTML", html); 
                // code here
            }
    });

See this answer for more examples

Fingerprint answered 16/8, 2015 at 22:30 Comment(8)
this is by far the easiest solution to use hereCaught
FYI - Requires API 19.Forgiving
Remember to put this into the onPageFinished method.Pants
@Forgiving How to achieve this below API 19?Beggar
@PratikSaluja Look at other answers too. This is just a concise way to do it in API 19 and up. There are other ways to do it.Fingerprint
@AkashKurianJose 🙄 That doesn't answer my problem. In fact, that's the indirect diversion to another topic. It's common we are here always looking for answers and questions that need not to remind. Anyways, thanks, I got the answer from some other post.Beggar
@PratikSaluja extremely sorry if my comment conveyed the wrong idea. The answer with most upvotes here is much older than my own answer and would probably work for you. Didn't mean anything beyond that. Very glad that you found the answer by looking elsewhere BTW.Fingerprint
how to access <html inside <frame? In Google Chrome for PC I can inspect all content, even html from iframes but how to do for Android with WebView and JS? In onPageFinished I started a task with which reruns each 3 seconds to call innerHTML again, I tried document.getElementsByTagName('html')[1].innerHTML but it's undefined, only works for 0 indexFurey
O
43

For android 4.2 and above, don't forget to add @JavascriptInterface to all javascript functions.

Overcheck answered 27/3, 2013 at 15:35 Comment(1)
Works for android 4.2 and ABOVE.Pants
G
10

Android WebView is just another render engine that render HTML contents downloaded from a HTTP server, much like Chrome or FireFox. I don't know the reason why you need get the rendered page (or screenshot) from WebView. For most of situation, this is not necessary. You can always get the raw HTML content from HTTP server directly.

There are already answers posted talking about getting the raw stream using HttpUrlConnection or HttpClient. Alternatively, there is a very handy library when dealing with HTML content parse/process on Android: JSoup, it provide very simple API to get HTML contents form HTTP server, and provide an abstract representation of HTML document to help us manage HTML parsing not only in a more OO style but also much easily:

// Single line of statement to get HTML document from HTTP server.
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();

It is handy when, for example, you want to download HTML document first then add some custom css or javascript to it before passing it to WebView for rendering. Much more on their official web site, worth to check it out.

Godmother answered 7/3, 2012 at 21:45 Comment(2)
Not usefull when you need javascript enabled browser to get HTML . e.g twitter.comGreet
In my case, I use WebView to let users edit the HTML page and then need to get result of the editing.Haily
S
8
with(webView) {
    settings.javaScriptEnabled = true
    webViewClient = object : WebViewClient() {
        override fun onPageFinished(view: WebView?, url: String?) {
            view?.evaluateJavascript("document.documentElement.outerHTML") {
                val html = it.replace("\\u003C", "<")
            }
        }
    }
}
Scrimshaw answered 25/5, 2022 at 14:49 Comment(0)
S
6

One touch point I found that needs to be put in place is "hidden" away in the Proguard configuration. While the HTML reader invokes through the javascript interface just fine when debugging the app, this works no longer as soon as the app was run through Proguard, unless the HTML reader function is declared in the Proguard config file, like so:

-keepclassmembers class <your.fully.qualified.HTML.reader.classname.here> {
    public *; 
}

Tested and confirmed on Android 2.3.6, 4.1.1 and 4.2.1.

Swayne answered 13/1, 2013 at 19:17 Comment(0)
S
4

Android will not let you do this for security concerns. An evil developer could very easily steal user-entered login information.

Instead, you have to catch the text being displayed in the webview before it is displayed. If you don't want to set up a response handler (as per the other answers), I found this fix with some googling:

URL url = new URL("https://stackoverflow.com/questions/1381617");
URLConnection con = url.openConnection();
Pattern p = Pattern.compile("text/html;\\s+charset=([^\\s]+)\\s*");
Matcher m = p.matcher(con.getContentType());
/* If Content-Type doesn't match this pre-conception, choose default and 
 * hope for the best. */
String charset = m.matches() ? m.group(1) : "ISO-8859-1";
Reader r = new InputStreamReader(con.getInputStream(), charset);
StringBuilder buf = new StringBuilder();
while (true) {
  int ch = r.read();
  if (ch < 0)
    break;
  buf.append((char) ch);
}
String str = buf.toString();

This is a lot of code, and you should be able to copy/paster it, and at the end of it str will contain the same html drawn in the webview. This answer is from Simplest way to correctly load html from web page into a string in Java and it should work on Android as well. I have not tested this and did not write it myself, but it might help you out.

Also, the URL this is pulling is hardcoded, so you'll have to change that.

Spagyric answered 18/2, 2012 at 18:31 Comment(0)
Q
1

Why not get the html first then pass it to the web view?

private String getHtml(String url){
    HttpGet pageGet = new HttpGet(url);

    ResponseHandler<String> handler = new ResponseHandler<String>() {
        public String handleResponse(HttpResponse response) throws ClientProtocolException, IOException {
            HttpEntity entity = response.getEntity();
            String html; 

            if (entity != null) {
                html = EntityUtils.toString(entity);
                return html;
            } else {
                return null;
            }
        }
    };

    pageHTML = null;
    try {
        while (pageHTML==null){
            pageHTML = client.execute(pageGet, handler);
        }
    } catch (ClientProtocolException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    return pageHTML;
}

@Override
public void customizeWebView(final ServiceCommunicableActivity activity, final WebView webview, final SearchResult mRom) {
    mRom.setFileSize(getFileSize(mRom.getURLSuffix()));
    webview.getSettings().setJavaScriptEnabled(true);
    WebViewClient anchorWebViewClient = new WebViewClient()
    {

        @Override
        public void onPageStarted(WebView view, String url, Bitmap favicon) {
            super.onPageStarted(view, url, favicon);

            //Do what you want to with the html
            String html = getHTML(url);

            if( html!=null && !url.equals(lastLoadedURL)){
                lastLoadedURL = url;
                webview.loadDataWithBaseURL(url, html, null, "utf-8", url);
            }
}

This should roughly do what you want to do. It is adapted from Is it possible to get the HTML code from WebView and shout out to https://stackoverflow.com/users/325081/aymon-fournier for his answer.

Quade answered 6/3, 2012 at 19:55 Comment(1)
HttpClient was deprecated in API Level 22 and removed in API Level 23. So the classes mentioned in your code cannot be imported in the java files.Dulse
T
1

I would suggest instead of trying to extract the HTML from the WebView, you extract the HTML from the URL. By this, I mean using a third party library such as JSoup to traverse the HTML for you. The following code will get the HTML from a specific URL for you

public static String getHtml(String url) throws ClientProtocolException, IOException {
        HttpClient httpClient = new DefaultHttpClient();
        HttpContext localContext = new BasicHttpContext();
        HttpGet httpGet = new HttpGet(url);
        HttpResponse response = httpClient.execute(httpGet, localContext);
        String result = "";

        BufferedReader reader = new BufferedReader(
            new InputStreamReader(
                response.getEntity().getContent()
            )
        );

        String line = null;
        while ((line = reader.readLine()) != null){
            result += line + "\n";
        }
        return result;
    }
Throb answered 17/3, 2012 at 18:34 Comment(2)
suppose the url obtain is reached by posting data . this method will fail.Aristarchus
Also what about cookies?Vittoria
L
0

try using HttpClient as Sephy said:

public String getHtml(String url) {
    HttpClient vClient = new DefaultHttpClient();
    HttpGet vGet = new HttpGet(url);
    String response = "";    

    try {
        ResponseHandler<String> vHandler = new BasicResponseHandler();
        response = vClient.execute(vGet, vHandler);
    } catch (Exception e) {
        e.printStackTrace();
    }
    return response;
}
Laure answered 13/2, 2012 at 2:47 Comment(5)
can you show a simple working example.i cannot implement your code in sephy's exampleFlooded
this method will get the html source of the given url. i.e. getHtml(google.com); will get you the source of google main pageLaure
its ok.is there any option to get webview source. THANKSFlooded
This somewhat didn't work for me. I did not get any content from a test site which content had been "hello world".Eshelman
@ChristoperHans This seems to be an alternative to WebView, not really getting the content from WebView?Faction
P
0

Its Simple to implement Just need javasript methods in your html to get value of html content. As Above your code some changes to be need.

  public class htmldecoder extends Activity implements OnClickListener,TextWatcher
    {
    Button btsubmit; // this button in your xml file
    WebView wvbrowser;
    public void onCreate(Bundle savedInstanceState)
    {
                super.onCreate(savedInstanceState);
                    setContentView(R.layout.htmldecoder);



        btsubmit=(Button)findViewById(R.id.btsubmit);
        btsubmit.setOnClickListener(this);

        wvbrowser=(WebView)findViewById(R.id.wvbrowser);
        wvbrowser.setWebViewClient(new HelloWebViewClient());
        wvbrowser.getSettings().setJavaScriptEnabled(true);
        wvbrowser.getSettings().setPluginsEnabled(true);
        wvbrowser.getSettings().setJavaScriptCanOpenWindowsAutomatically(true);
        MyJavaScriptInterface myinterface=new MyJavaScriptInterface();
        wvbrowser.addJavascriptInterface(myinterface,"interface");
        webView.loadUrl("file:///android_asset/simple.html");  //use one html file for //testing put your html file in assets. Make sure that you done JavaScript methods to get //values for html content in html file . 
   }
   public void onClick(View v)
{
    if(btsubmit==v)
    {

        webView.loadUrl("javascript:showalert()");// call javascript method.  
        //wvbr
    }
}

final class MyJavaScriptInterface {



        MyJavaScriptInterface() {

        }

        public void sendValueFromHtml(String value) {
           System.out.println("Here is the value from html::"+value);
        }

    }

}

Your Javascript in html

 <script type="text/javascript">
    //<![CDATA[
    var n1;
    function callme(){
    n1=document.getElementById("FacadeAL").value;
    }
    function showalert(){
     window.interface.sendValueFromHtml(n1);// this method calling the method of interface which //you attached to html file in android. // & we called this showalert javasript method on //submmit buttton click of android. 
    }
    //]]>
    </script>

& Make sure you calling callme like below in html

<input name="FacadeAL" id="FacadeAL" type="text" size="5" onblur="callme()"/>
Hope this will help you.

Poss answered 11/3, 2012 at 6:40 Comment(5)
what does this means & Make sure you calling callme like below in html.Did you mean to place the input tag below the script in html file ? Thank YouDeferment
no dude you have to call javasript method callme() onblur of input type text in html tag.Poss
then where to add this input tag.is this button visible?Deferment
this code work like ,when loading activity there is a text box in the webview and the typed text shows at the text box.But i want the html code in the webview.Deferment
can u help me to sort out this problem? Thank you very muchDeferment
B
0

I suggest to try out some Reflection approach, if you have time to spend on the debugger (sorry but I didn't have).

Starting from the loadUrl() method of the android.webkit.WebView class:

http://grepcode.com/file/repository.grepcode.com/java/ext/com.google.android/android/2.2_r1.1/android/webkit/WebView.java#WebView.loadUrl%28java.lang.String%2Cjava.util.Map%29

You should arrive on the android.webkit.BrowserFrame that call the nativeLoadUrl() native method:

http://grepcode.com/file/repository.grepcode.com/java/ext/com.google.android/android/2.2_r1.1/android/webkit/BrowserFrame.java#BrowserFrame.nativeLoadUrl%28java.lang.String%2Cjava.util.Map%29

The implementation of the native method should be here:

http://gitorious.org/0xdroid/external_webkit/blobs/a538f34148bb04aa6ccfbb89dfd5fd784a4208b1/WebKit/android/jni/WebCoreFrameBridge.cpp

Wish you good luck!

Butch answered 15/3, 2012 at 15:42 Comment(0)
C
-3

above given methods are for if you have an web url ,but if you have an local html then you can have also html by this code

AssetManager mgr = mContext.getAssets();
             try {
InputStream in = null;              
if(condition)//you have a local html saved in assets
                            {
                            in = mgr.open(mFileName,AssetManager.ACCESS_BUFFER);
                           }
                            else if(condition)//you have an url
                            {
                            URL feedURL = new URL(sURL);
                  in = feedURL.openConnection().getInputStream();}

                            // here you will get your html
                 String sHTML = streamToString(in);
                 in.close();

                 //display this html in the browser or web view              


             } catch (IOException e) {
             // TODO Auto-generated catch block
             e.printStackTrace();
             }
        public static String streamToString(InputStream in) throws IOException {
            if(in == null) {
                return "";
            }

            Writer writer = new StringWriter();
            char[] buffer = new char[1024];

            try {
                Reader reader = new BufferedReader(new InputStreamReader(in, "UTF-8"));

                int n;
                while ((n = reader.read(buffer)) != -1) {
                    writer.write(buffer, 0, n);
                }

            } finally {

            }

            return writer.toString();
        }
Corporator answered 20/11, 2011 at 10:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.