Download file using HtmlUnit
Asked Answered
S

6

12

I am trying to download xls file for a website. When I click the link to download the file, I get a javascript confirm box. I handle it like below

    ConfirmHandler okHandler = new ConfirmHandler(){
            public boolean handleConfirm(Page page, String message) {
                return true;
            }
        };
    webClient.setConfirmHandler(okHandler);

There is a link to download file.

<a href="./my_file.php?mode=xls&amp;w=d2hlcmUgc2VsbElkPSd3b3JsZGNvbScgYW5kIHN0YXR1cz0nV0FJVERFTEknIGFuZCBkYXRlIDw9IC0xMzQ4MTUzMjAwICBhbmQgZGF0ZSA%2BPSAtMTM1MDgzMTU5OSA%3D" target="actionFrame" onclick="return confirm('Do you want do download XLS file?')"><u>Download</u></a>

I click the link using

HTMLPage x = webClient.getPage("http://working.com/download");
HtmlAnchor anchor = (HtmlAnchor) x.getFirstByXPath("//a[@target='actionFrame']");
anchor.click();

handeConfirm() method is excuted. But I have no idea how to save the file stream from server. I tried to see the stream with code below.

anchor.click().getWebResponse().getContentAsString();

But, the result is same as the page x. Anyone knows how to capture the stream from server? Thank you.

Staffordshire answered 21/10, 2012 at 12:38 Comment(2)
anchor.click() will return a page. That should contian your XLS fileChief
see my answer to a similar question at https://mcmap.net/q/1007526/-htmlunit-download-fileEmpoison
S
10

I found a way to get InputStream using WebWindowListener. Inside of webWindowContentChanged(WebWindowEvent event), I put code below.

InputStream xls = event.getWebWindow().getEnclosedPage().getWebResponse().getContentAsStream();

After I get xls, I could save the file into my hard disk.

Staffordshire answered 22/10, 2012 at 23:56 Comment(2)
I am downloading a csv file, can you pls explain what is event and when are you calling the click event on anchor. I dont have confirmation box for downloading file.Tailwind
@Tailwind This works exactly as described above, except that instead of .getContentAsStream() you'll be using getContentAsString() and then you can String.split("\n") to get individual lines, and then split the lines once again with String.split(",") to get the info on each individual line.Ddt
L
9

I made it based on your post.. Note: you can change content-type condition for download only specific type of file. eg.( application/octect-stream, application/pdf, etc).

package net.s4bdigital.export.main;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.List;

import org.junit.Before;
import org.junit.Test;
import org.openqa.selenium.By;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.htmlunit.HtmlUnitDriver;

import com.gargoylesoftware.htmlunit.ConfirmHandler;
import com.gargoylesoftware.htmlunit.Page;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.WebResponse;
import com.gargoylesoftware.htmlunit.WebWindowEvent;
import com.gargoylesoftware.htmlunit.WebWindowListener;
import com.gargoylesoftware.htmlunit.util.NameValuePair;

public class HtmlUnitDownloadFile {

    protected String baseUrl;
    protected static WebDriver driver;

    @Before
    public void openBrowser() {
        baseUrl = "http://localhost/teste.html";
        driver = new CustomHtmlUnitDriver();
        ((HtmlUnitDriver) driver).setJavascriptEnabled(true);

    }


    @Test
    public void downloadAFile() throws Exception {

        driver.get(baseUrl);
        driver.findElement(By.linkText("click to Downloadfile")).click();

    }

    public class CustomHtmlUnitDriver extends HtmlUnitDriver { 

          // This is the magic. Keep a reference to the client instance 
           protected WebClient modifyWebClient(WebClient client) { 


             ConfirmHandler okHandler = new ConfirmHandler(){
                    public boolean handleConfirm(Page page, String message) {
                        return true;
                    }
             };
             client.setConfirmHandler(okHandler);

             client.addWebWindowListener(new WebWindowListener() {

                public void webWindowOpened(WebWindowEvent event) {
                    // TODO Auto-generated method stub

                }

                public void webWindowContentChanged(WebWindowEvent event) {

                    WebResponse response = event.getWebWindow().getEnclosedPage().getWebResponse();
                    System.out.println(response.getLoadTime());
                    System.out.println(response.getStatusCode());
                    System.out.println(response.getContentType());

                    List<NameValuePair> headers = response.getResponseHeaders();
                    for(NameValuePair header: headers){
                        System.out.println(header.getName() + " : " + header.getValue());
                    }

                    // Change or add conditions for content-types that you would to like 
                    // receive like a file.
                    if(response.getContentType().equals("text/plain")){
                        getFileResponse(response, "target/testDownload.war");
                    }



                }

                public void webWindowClosed(WebWindowEvent event) {



                }
            });          

             return client; 
           } 


    } 

    public static void getFileResponse(WebResponse response, String fileName){

        InputStream inputStream = null;

        // write the inputStream to a FileOutputStream
        OutputStream outputStream = null; 

        try {       

            inputStream = response.getContentAsStream();

            // write the inputStream to a FileOutputStream
            outputStream = new FileOutputStream(new File(fileName));

            int read = 0;
            byte[] bytes = new byte[1024];

            while ((read = inputStream.read(bytes)) != -1) {
                outputStream.write(bytes, 0, read);
            }

            System.out.println("Done!");

        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (inputStream != null) {
                try {
                    inputStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
            if (outputStream != null) {
                try {
                    // outputStream.flush();
                    outputStream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }

            }
        }

    }

}
Landside answered 18/8, 2014 at 18:7 Comment(6)
I m sorry but I dont get it, where or how exactly are you keeping the reference to webclient in modifywebclient method......thanksHorizon
selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/… Anudeep Samaiya Is a method of superclass.. we can override it adding a handle to confirm windows of download file.. But u need modify content type waited in your case.Landside
I have faced one problem like it downloads the file but not the complete. Content of the file is half.Culottes
@Culottes I never faced it , however I have a clue, did you already verified the "Content-Length" header in http response in our especific case ? is it correct ?Landside
@EduardoFabricio can you guide me how I can verify the 'Content-Length' header. Actually, I'm trying to download file that has 101 row when I download the file manually or using chrome driver but when I try with the HTML unit it has only 70 row.Culottes
I've tried everything. quite incredible that this workedGrimalkin
T
3

There's an easier way if you're not into wrapping HtmlUnit with Selenium. Simply provide HtmlUnit's WebClient with the extended WebWindowListener.

You could also use Apache commons.io for easy stream copying.

WebClient webClient = new WebClient();
webClient.addWebWindowListener(new WebWindowListener() {
    public void webWindowOpened(WebWindowEvent event) { }

    public void webWindowContentChanged(WebWindowEvent event) {
        // Change or add conditions for content-types that you would
        // to like receive like a file.
        if (response.getContentType().equals("text/plain")) {
            try {
                IOUtils.copy(response.getContentAsStream(), new FileOutputStream("downloaded_file"));
            } catch (FileNotFoundException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }

    }

    public void webWindowClosed(WebWindowEvent event) {}
});
Theretofore answered 30/9, 2015 at 21:37 Comment(1)
how to get response in webWindowContentChanged method?Sumatra
L
2
 final WebClient webClient = new WebClient(BrowserVersion.CHROME);
        webClient.getOptions().setTimeout(2000);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
        webClient.waitForBackgroundJavaScript(2000);

        //get General page
        final HtmlPage page = webClient.getPage("http://your");

        //get Frame
        final HtmlPage frame = ((HtmlPage) 
        page.getFrameByName("Frame").getEnclosedPage());

        webClient.setConfirmHandler(new ConfirmHandler() {
            public boolean handleConfirm(Page page, String message) {
                return true;
            }
        });

        //get element file
        final DomElement file = mainFrame.getElementByName("File");

        final InputStream xls =  file.click().getWebResponse().getContentAsStream();

        assertNotNull(xls);
    }
Laband answered 22/6, 2017 at 14:14 Comment(0)
G
2

Expanding on Roy's answer, here's my solution to this problem:

public static void prepareForDownloadingFile(WebClient webClient, File output) {
    webClient.addWebWindowListener(new WebWindowListener() {

        public void webWindowOpened(WebWindowEvent event) {
        }

        public void webWindowContentChanged(WebWindowEvent event) {
            Page page = event.getNewPage();
            FileOutputStream fos = null;
            InputStream is = null;
            if (page != null && page instanceof UnexpectedPage) {
                try {
                    fos = new FileOutputStream(output);
                    UnexpectedPage uPage = (UnexpectedPage) page;
                    is = uPage.getInputStream();
                    IOUtils.copy(is, fos);
                    webClient.removeWebWindowListener(this);
                } catch (Exception e) {
                    e.printStackTrace();
                } finally {
                    try {
                        if (fos != null)
                            fos.close();
                        if (is != null)
                            is.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            }

        }

        public void webWindowClosed(WebWindowEvent event) {
        }
    });
}

I felt there were enough differences to make it a new answer:
-Doesn't have a magic variable (response)
-Closes InputStream and FileOutputStream
-Looks for UnexpectedPage to determine we're not on a HTML page
-Downloads a file one time after requesting then removes itself
-Doesn't require knowing the ContentType

Calling this once before, for example, clicking a button that initiates a download, will download that file.

Goliard answered 25/8, 2018 at 2:54 Comment(0)
E
-1

Figure out the download URL, and scrape it in List. from the download url we can get the entire file using this code.

    try{
        String path = "your destination path";
        List<HtmlElement> downloadfiles = (List<HtmlElement>) page.getByXPath("the tag you want to scrape");
        if (downloadfiles.isEmpty()) {
            System.out.println("No items found !");
        } else {
            for (HtmlElement htmlItem : downloadfiles) {
                String DownloadURL = htmlItem.getHrefAttribute();

                Page invoicePdf = client.getPage(DownloadURL);
                if (invoicePdf.getWebResponse().getContentType().equals("application/pdf")) {
                    System.out.println("creatign PDF:");
                    IOUtils.copy(invoicePdf.getWebResponse().getContentAsStream(),
                            new FileOutputStream(path + "file name"));
                }
            }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
Eure answered 7/11, 2017 at 1:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.