How to convert an HTML file to PDF using wkhtmlpdf in Java
Asked Answered
W

3

14

I want to convert an HTML file into a PDF file using wkhtmltopdf. wkhtmltopdf is the best option for me as it renders the HTML file using WebKit. The problem is that I want to do the same using Java but wkhtmltopdf does not provide any Java API.

I can use Runtime.exec() or ProcessBuilder to fork a new process from Java and create the PDF output using wkhtmtopdf in that process. But, as I am developing a web based application, I am not allowed to create so many new processes in the server.

Is there any other way so that I can use wkhtmltopdf? I really want to use it as it's giving me the exact output.

Or, is there any other open source browser engine that provides a Java API that can render my HTML page just like wkhtmltopdf?

Whitman answered 10/2, 2015 at 22:0 Comment(0)
T
10

Remember that the system running your Java Code must have wkhtmltopdf installed for anything I'm saying here to work... go to www.wkhtmltopdf.org and download the version you need.

I know this is old and by now you've certainly figured this out, but if you don't want to use the JNI or JNA to do this you can do it pretty simply through .exec calls on your system.

Here is a class that does exactly what you want without having to fuss with JNI or JNA:

public class MegaSimplePdfGenerator {

    public void makeAPdf() throws InterruptedException, IOException {
        Process wkhtml; // Create uninitialized process
        String command = "wkhtmltopdf http://www.google.com /Users/Shared/output.pdf"; // Desired command

        wkhtml = Runtime.getRuntime().exec(command); // Start process
        IOUtils.copy(wkhtml.getErrorStream(), System.err); // Print output to console

        wkhtml.waitFor(); // Allow process to run
    }
}

You MUST to somehow bind to one of the input streams for the process to run. That can be the inputStream or the errorStream. In this case since I'm just writting to a file I went ahead and just connected the System.err to the errorStream from the wkhtml process.

How to use only streams!

If you want the source HTML to come from a stream and/or the destination PDF to be written to a stream then you would use a '-' for the "URI" instead of a regular string.

Example: wkhtmltopdf - - or wkhtmltopdf /Users/Shared/somefile.html -

You can then capture the input and output streams and write and read as needed.

If you are only connecting to a single stream then you don't need to use threads and you won't get a scenario where the streams are waiting on each other endlessly.

However if you are using a stream for BOTH the HTML source AND the PDF Destination, then you MUST use Threads for the process to ever complete.

NOTE: Remember that the OutputStream must be flushed and closed for wkhtmltopdf to start building the PDF and streaming the results!

Example:

public class StreamBasedPdfGenerator {
  public void makeAPdfWithStreams() throws InterruptedException, IOException {
        Process wkhtml; // Create uninitialized process

        // Start by setting up file streams
        File destinationFile = new File("/Users/Shared/output.pdf");
        File sourceFile = new File("/Users/Shared/pdfPrintExample.html");

        FileInputStream fis = new FileInputStream(sourceFile);
        FileOutputStream fos = new FileOutputStream(destinationFile);

        String command = "wkhtmltopdf - -"; // Desired command

        wkhtml = Runtime.getRuntime().exec(command); // Start process

        Thread errThread = new Thread(() -> {
            try {
                IOUtils.copy(wkhtml.getErrorStream(), System.err);
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        });
        Thread htmlReadThread = new Thread(() -> {
            try {
                IOUtils.copy(fis, wkhtml.getOutputStream());
                wkhtml.getOutputStream().flush();
                wkhtml.getOutputStream().close();
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        });
        Thread pdfWriteThread = new Thread(() -> {
            try {
                IOUtils.copy(wkhtml.getInputStream(), fos);
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        });

        // Do NOT use Run... it should be clear why, you want them to all be going at the same time.
        errThread.start();
        pdfWriteThread.start();
        htmlReadThread.start();

         // Connect HTML Source Stream to wkhtmltopdf
         // Connect PDF Source Stream from wkhtmltopdf to the Destination file steam

        wkhtml.waitFor(); // Allow process to run
    }
}

Streams are great for when you're running this on a web server and want to avoid creating temporary HTML or PDF files, you can simply stream the response back by capturing and writing to the HTTP Response Stream.

I hope this helps somebody!

Thoughtout answered 22/6, 2017 at 18:52 Comment(1)
Solved this a few years ago exactly in this way. I thought there should be some better way. So haven't updated my solution here.Whitman
S
7

Give htmltopdf-java a try. It uses the native libraries generated by wkhtmltopdf, so you should expect the same result with more control over the flow.

(I am the author of this library)

Stilbestrol answered 21/5, 2018 at 9:0 Comment(2)
Hi Ben, I've tried your library and it works great while being integrated with the Java application. However, I need a version that would work on Android and I couldn't manage to generate the native library - libwkhtmltox for the android platform. I tried to recompile the source code of the wkhtmltopdf, but it appears the lib uses QtWebKit internally and as far as I know there is no support for it on Android. Do you have any ideas/tips? As I could see in your source code, there is a support for win/mac/linux, but not for the Android. Thanks a lot!Irreligion
While the library is thread-safe, it unfortunately cannot perform conversions concurrently. Because wkhtmltopdf uses Qt behind the scenes to render webpages, there is a single thread which performs such rendering across a single process. Therefore, at this point, it is only possible to perform one conversion at the same time per process.Wagshul
M
2

wkhtmltopdf has a C API. You can then use JNI for Java to C communication.

Edit: There's a Java wrapper as well: wkhtmltopdf-wrapper.

Magulac answered 3/9, 2015 at 15:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.