android - get Text out of webview
Asked Answered
I

5

29

In my application, I am showing epub HTML files in webview using EPUBLIB. My problem is that I want to use bookmark functionality for my epub reader. For that I want to fetch text from webview which is showing page from my epub's HTML file and then use that text in my bookmark activity to show the user what they have bookmarked. How can I achieve this?

Irrational answered 6/3, 2012 at 7:50 Comment(0)
K
50

Getting the plain text content from a webview is rather hard. Basically, the android classes don't offer it, but javascript does, and Android offers a way for javascript to pass the information back to your code.

Before I go into the details, do note that if your html structure is simple, you might be better off just parsing the data manually.

That said, here is what you do:

  1. Enable javascript
  2. Add your own javascript interface class, to allow the javascript to communicate with your Android code
  3. Register your own webviewClient, overriding the onPageFinished to insert a bit of javascript
  4. In the javascript, acquire the element.innerText of the tag, and pass it to your javascript interface.

To clarify, I'll post a working (but very rough) code example below. It displays a webview on the top, and a textview with the text-based contents on the bottom.

package test.android.webview;

import android.app.Activity;
import android.os.Bundle;
import android.webkit.WebView;
import android.webkit.WebViewClient;
import android.widget.TextView;

public class WebviewTest2Activity extends Activity {
    /** Called when the activity is first created. */
    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.main);

        WebView webView = (WebView) findViewById(R.id.webView);
        TextView contentView = (TextView) findViewById(R.id.contentView);

        /* An instance of this class will be registered as a JavaScript interface */ 
        class MyJavaScriptInterface 
        { 
            private TextView contentView;

            public MyJavaScriptInterface(TextView aContentView)
            {
                contentView = aContentView;
            }

            @SuppressWarnings("unused") 

            public void processContent(String aContent) 
            { 
                final String content = aContent;
                contentView.post(new Runnable() 
                {    
                    public void run() 
                    {          
                        contentView.setText(content);        
                    }     
                });
            } 
        } 

        webView.getSettings().setJavaScriptEnabled(true); 
        webView.addJavascriptInterface(new MyJavaScriptInterface(contentView), "INTERFACE"); 
        webView.setWebViewClient(new WebViewClient() { 
            @Override 
            public void onPageFinished(WebView view, String url) 
            { 
                view.loadUrl("javascript:window.INTERFACE.processContent(document.getElementsByTagName('body')[0].innerText);"); 
            } 
        }); 

        webView.loadUrl("http://shinyhammer.blogspot.com");
    }
}

Using the following main.xml:

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    android:layout_width="fill_parent"
    android:layout_height="fill_parent"
    android:orientation="vertical" >

    <WebView
        android:id="@+id/webView"
        android:layout_width="match_parent"
        android:layout_height="fill_parent"
        android:layout_weight="0.5" />

    <TextView
        android:id="@+id/contentView"
        android:layout_width="match_parent"
        android:layout_height="fill_parent"
        android:layout_weight="0.5" />


</LinearLayout>
Krissie answered 6/3, 2012 at 9:33 Comment(10)
Can you explain this line in details ? view.loadUrl("javascript:window.INTERFACE.processContent(document.getElementsByTagName('body')[0].innerText);");Irrational
It's step 4 from the explanation. From left to right, it (a) loads a url that (b) simply injects some javascript that (c) calls the procesContent() method of the custom javascript interface class INTERFACE, registered from the android code, passing (d) the innerText property of the body text of the page currently showing. If you have specific questions, ask away!Krissie
As a sidenote, I deliberately included an example you can copy paste into a new android project to test it out. If you are new to this stuff, simply stepping through source might be enlightening. It is fairly complex stuff, as it is two different techniques (android webview customization, javascript fiddling) coming together.Krissie
Thank you. :) it rly helped me, and example worked as u said :)Irrational
@Krissie thanks, though a question here, will there be two instance variable html/string? ie one originally containd in webview and other passed by js to interface?Skyros
@rohit - how did you made the bookmark then ? from here you can extract the text from the web view. I think when you save the bookmark you save some part of the text which is bookmarked. Later you search it in the given spine and then use some javascript method to take you to that particular offset i.e. from text to offset. Can you share how were you able to achieve "search" and "text to offset". The innerText will give all the text of the current spine. Is there any javascript method which can only give the text displayed in the current view.(e.g. content of only page no. 2)Jeffcott
could you plz anyone give me newer api solution above to 17 api; because it's not working in kitkat.Carboloy
For the benefit of others: the method processContent(...) specified in the answer of Paul-Jan works only if @JavascriptInterface annotation is specified for the method if your target sdk version is >=17 as per developer.android.com/guide/webapps/…Engram
I/chromium: [INFO:CONSOLE(1)] "Uncaught TypeError: window.INTERFACE.processContent is not a function", source: (1) ....I get this error in Android 6.0. What am I missing?Uncommunicative
Figured out the issue, just add @JavascriptInterface above processContent(...) for Kitkat and above as mvsagar saidUncommunicative
B
9

Java:

    wvbrowser.evaluateJavascript(
        "(function() { return ('<html>'+document.getElementsByTagName('html')[0].innerHTML+'</html>'); })();",
         new ValueCallback<String>() {
            @Override
            public void onReceiveValue(String html) {
                Log.d("HTML", html); 
                // code here
            }
    });

Kotlin:

web_browser.evaluateJavascript("(function() { return ('<html>'+document.getElementsByTagName('span')[0].innerText+'</html>'); })();")
 { html ->
   Toast.makeText(this@Your_activity, html, Toast.LENGTH_SHORT).show()
   // code here
                }
Bonefish answered 17/2, 2017 at 9:34 Comment(2)
NB that both methods still work on kitkat+, just evaluateJavascript is preferred because it's got a callback so is more easily asynchronous (if you need a return value especially)...Sabbatarian
braces after the return statement return () worked for me. Anyways works like a charm!Bassorilievo
J
5

The solution provided above provides the text using innerText property which will return you all the text in the webView. The solution that I propose below will help you extract the text from visible part of the webView on the screen.

Step 1: It requires the help of javaScript, hence first enable the javascript.

webView.addJavascriptInterface(new IJavascriptHandler(getActivity().getApplicationContext()),     "Android"); //if your class extends a Fragment class

or

view.addJavascriptInterface(new IJavascriptHandler(this), "Android"); //if your class extends Activity.

Step 2: Create a javaInterface inner class.

final class IJavascriptHandler {

    Context mContext;
    IJavascriptHandler(Context c) {
    mContext = c;
}

//API 17 and higher required you to add @JavascriptInterface as mandatory before your method.   
@JavascriptInterface 
public void processContent(String aContent) 
{ 
   //this method will be called from within the javascript method that you will write.
   final String content = aContent;
   Log.e("The content of the current page is ",content);
} 
}

Step 3: Now you have to add the javascript method. You'll write the method as a string and then load it. The method returns the text based on the parameter provided to it. So, you would need 2 strings. One will load the javascript method and the other will call it.

Method to load the javascript method.

String javaScriptToExtractText = "function getAllTextInColumn(left,top,width,height){"
                +   "if(document.caretRangeFromPoint){"
                +   "var caretRangeStart = document.caretRangeFromPoint(left, top);"
                +   "var caretRangeEnd = document.caretRangeFromPoint(left+width-1, top+height-1);"
                +   "} else {"
                +   "return null;"
                +   "}"
                +   "if(caretRangeStart == null || caretRangeEnd == null) return null;"
                +   "var range = document.createRange();"
                +   "range.setStart(caretRangeStart.startContainer, caretRangeStart.startOffset);"
                +   "range.setEnd(caretRangeEnd.endContainer, caretRangeEnd.endOffset);"
                +   "return range.toString();};";

Method to call the above function.

String javaScriptFunctionCall = "getAllTextInColumn(0,0,100,100)";

//I've provided the parameter here as 0,0 i.e the left and top offset and then 100, 100 as width and height. So, it'll extract the text present in that area.

Step 4: Now, you need to load the above 2 javascripts.

webView.loadUrl("javascript:"+ javaScriptToExtractText);
//this will load the method.


view.loadUrl("javascript:window.Android.processContent("+javaScriptFunctionCall+");");
//this will call the loaded javascript method.

Enjoy.

Jeffcott answered 20/8, 2014 at 7:50 Comment(0)
F
4

The only thing that comes to my mind in this case is to use javascript. Doing a quick search I found android.webkit.WebView.addJavascriptInterface.

You want to study the "addJavascriptInterface" which in the end will help you solve the problem

Fadeout answered 6/3, 2012 at 9:3 Comment(3)
I dont know much about js,html,etc.. Can you tell me any good tutorial that I can follow :)Irrational
Watching the answer given by Paul-Jan I see that i was on the right track. If you follow his instructions you might be able to make it work. I suggest that you do some research: internet is full of tutorials for javascript and html, and today these skills are a MUST for a developer.Fadeout
:D yeah , I started searching it already, thank you very much for guiding in right direction.Irrational
A
0

Why don't you fetch the text with EPUBLIB from the book directly?

You got that html with the help of EPUBLIB isn't it? How did you put that in the webvieuw? I see no example.

Animatism answered 6/3, 2012 at 9:34 Comment(6)
yeah you are right, I got the html file as string but with all html tags that I must pass to webview. I only want some part,means lets say only 3rd paragraph from that string, I couldnt do that with ur method, right ?Irrational
You can just parse that out. First determine the position of the first <p>. Then make a substring() of the text from that tag. Repeat until the n'th tag found. Now determine the end of the paragraph and get a final substring().Animatism
thats what Paul answered in different and easy way. your method would be helpful for developers like me who dnt know much about JS, but if you know that todays most topmost things in world are HTML,JS,CSS and android is offering such a good functionality to add js in your java code, we must make use of that. its my personal opinion :)Irrational
Even if you use the javascript interface you -only- get the innerText() and you still have to parse the paragraph out. So why not do it right away?Animatism
I dont know, but may be there could be some methods in JS that ll give me text from <p> tag directly. .Irrational
hi... maybe u could use a simple android library to get that right away... :)Pedagogics

© 2022 - 2024 — McMap. All rights reserved.