Selendroid as a web scraper
Asked Answered
N

3

15

I intend to create an Android application that performs a headless login to a website and then scrape some content from the subsequent page while maintaining the logged-in session.

I first used HtmlUnit in a normal Java project and it worked just fine. But later found that HtmlUnit is not compatible with Android.

Then I tried JSoup library by sending HTTP “POST” request to the login form. But the resulting page does not load up completely since JSoup won't support JavaScript.

I was then suggested to have a look on Selendroid which actually is an android test automation framework. But what I actually need is an Html parser that supports both JavaScript and Android. I find Selendroid quite difficult to understand which I can't even figure out which dependencies to use.

  • selendroid-client
  • selendroid-standalone
  • selendroid-server

With Selenium WebDriver, the code would be as simple as the following. But can somebody show me a similar code example for Selendroid as well?

    WebDriver driver = new FirefoxDriver();
    driver.get("https://mail.google.com/");

    driver.findElement(By.id("email")).sendKeys(myEmail);
    driver.findElement(By.id("pass")).sendKeys(pass);

    // Click on 'Sign In' button
    driver.findElement(By.id("signIn")).click();

And also,

  1. What dependencies to add to my Gradle.Build file?
  2. Which Selendroid libraries to import?
Neruda answered 5/5, 2015 at 16:41 Comment(0)
N
2

Unfortunately I didn't get Selendroid to work. But I find a workaround to scrape dynamic content by using just Android's built in WebView with JavaScript enabled.

mWebView = new WebView();
mWebView.getSettings().setJavaScriptEnabled(true);
mWebView.addJavascriptInterface(new HtmlHandler(), "HtmlHandler");

mWebView.setWebViewClient(new WebViewClient() {
   @Override
   public void onPageFinished(WebView view, String url) {
       super.onPageFinished(view, url);

       if (url == urlToLoad) {
       // Pass html source to the HtmlHandler
       WebView.loadUrl("javascript:HtmlHandler.handleHtml(document.documentElement.outerHTML);");

   }
});

The JS method document.documentElement.outerHTML will retrieve the full html contained in the loaded url. Then the retrived html string is sent to handleHtml method in HtmlHandler class.

class HtmlHandler {
        @JavascriptInterface
        @SuppressWarnings("unused")
        public void handleHtml(String html) {
            // scrape the content here

        }
    }

You may use a library like Jsoup to scrape the necessary content from the html String.

Neruda answered 26/8, 2015 at 18:24 Comment(1)
This solution works but when I try to implement it on a website having multiple redirects, it fails even though I compare the urls, the redirects take the page to and through the same url , I have used counters but cannot seem to know when exactly the page is fully loaded.Kcal
G
1

I never had used Selendroid so I'm not really sure about that but searching by the net I found this example and, according to it, I suppose that your code translation from Selenium to Selendroid would be:

Translation code (in my opinion)

public class MobileWebTest {
  private SelendroidLauncher selendroidServer = null;
  private WebDriver driver = null;

  @Test
  public void doTest() {
    
     driver.get("https://mail.google.com/");

     WebElement email = driver.findElement(By.id("email")).sendKeys(myEmail);
     WebElement password = driver.findElement(By.id("pass")).sendKeys(pass);

     WebElement button = driver.findElement(By.id("signIn")).click();

     driver.quit();
  }

  @Before
  public void startSelendroidServer() throws Exception {
    if (selendroidServer != null) {
      selendroidServer.stopSelendroid();
    }

    SelendroidConfiguration config = new SelendroidConfiguration();

    selendroidServer = new SelendroidLauncher(config);
    selendroidServer.launchSelendroid();

    DesiredCapabilities caps = SelendroidCapabilities.android();

    driver = new SelendroidDriver(caps);
  }

  @After
  public void stopSelendroidServer() {
    if (driver != null) {
      driver.quit();
    }
    if (selendroidServer != null) {
      selendroidServer.stopSelendroid();
    }
  }
}

What do you have to add to your project

It seems that you have to add to your project the Selendroid standalone jar file. If you have doubts about how to add a external jar in an Android project you can see this question: How can I use external JARs in an Android project?

Here you can download the jar file: jar file

Also, it seems that it is not enough just to add the jar file to your project. You should add too the selendroid-client jar file of the version of standalone that you have.

You can download it from here: client jar file

I expect it will be helpful for you!

Galleywest answered 21/8, 2015 at 16:18 Comment(6)
is it necessary to start/stop Selendroid server to use the driver?Slippage
As I said in my answer I never had used Selendroid, I just collect all the info and put together in an answer so I can't confirm that what I say it's really true but it looks like it is necessary, as the official page says: Run the selendroid-standalone server. Here you have the source where I saw this with a video DEMO: selendroid.io/mobileWeb.htmlGalleywest
this seems to throw the error: Error:Execution failed for task ':app:preDexDebug'. > com.android.ide.common.process.ProcessException: org.gradle.process.internal.ExecException: Process 'command '/Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/bin/java'' finished with non-zero exit value 134Slippage
Try with these questions: #29755660, #29721331, #30854388Galleywest
I have tried all of these solutions with no successSlippage
@Steve Maybe you should make another question with the problem. Searching here in SO with the full error that you had, there a lot of questions but with different numbers when it says: finished with non-zero exit value 134. Maybe someone had the same problem as you. Anyway, I'm going to still searching if I see some solution for you.Galleywest
R
0

I would suggest you use WebdriverIO since you want to use Javascript. It uses NodeJs so it will be easy to require other plugins to scrape the HTML.

Appium is also an alternative but it's more focused on front-end testing.

Retrogressive answered 21/8, 2015 at 9:15 Comment(2)
are you sure WebdriverIO can be used with for android web scrappingSlippage
WebDriverIO can handle Android for sureRetrogressive

© 2022 - 2024 — McMap. All rights reserved.