How to create preview image from Microsoft document using java
Asked Answered
T

3

5

Currently, I am working on Microsoft document : Word (doc, docx), Powerpoint (ppt, pptx), and Excel (xls, xlsx)

I would like to create the a preview image from it's first page.

Only PowerPoint document can be done by Apache-poi library.

But I cannot find the solution for other types.

I have got an idea to convert the document to pdf (1) and the convert to image (2) .

For step 2 (convert pdf to image), there are many free java libraries e.g. PDFBox. It work fine with my dummy pdf file

However, I have a problem in Step 1

In my document, it may contains text with several styles, tables, images, or objects. Sample image from first page of word document:

Sample image from first page of word document

Which open source java library can do this task?

I have tried to implement with following libraries:

JODConverter - The output look fine, but it requires OpenOffice.

docx4j - I'm not sure whether it can work with non ooxml format (doc, xls, ppt) and it really free? Following is example code:

String inputWordPath = "C:\\Users\\test\\Desktop\\TestPDF\\Docx.docx";
String outputPDFPath = "C:\\Users\\test\\Desktop\\TestPDF\\OutDocx4j.pdf";
try {
    InputStream is = new FileInputStream(new File(inputWordPath));
    WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(is);
    Mapper fontMapper = new IdentityPlusMapper();
    wordMLPackage.setFontMapper(fontMapper);
    Docx4J.toPDF(wordMLPackage, new FileOutputStream(new File(outputPDFPath)));
} catch (Exception e) {
    e.printStackTrace();
}

The output look ok but it contains "## Evaluation Use only ##" in generated pdf.

xdocreport - The generated pdf does not contain image.

String inputWordPath = "C:\\Users\\test\\Desktop\\TestPDF\\Docx.docx";
String outputPDFPath = "C:\\Users\\test\\Desktop\\TestPDF\\OutXDOCReport.pdf";
InputStream is = new FileInputStream(new File(inputWordPath));
XWPFDocument document = new XWPFDocument(is);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(outputPDFPath));
PdfConverter.getInstance().convert(document, out, options);

I can not find the suitable library for the task.

  • Do you have any suggestion?

  • Can I convert document (docx, doc, xlsx, xls) to image directly?

  • Is docx4j really free on conversion feature?

  • How to remove "## Evaluation Use only ##" from generated pdf (by docx4j)?

  • Can docx4j work with non ooxml document?

  • Can I convert only first page to pdf?

  • Can I set size of pdf to fit with converted document content?

  • Are there any library and example code to convert document to pdf or convert document to image?

Thekla answered 27/11, 2017 at 11:15 Comment(1)
Yes, Libre Office with or without JODConverter is a good way to go. You do have to live with installing Libre Office but it's free.Protecting
M
4

If you can afford having a LibreOffice (or Apache OpenOffice) installation, JODConverter should do the trick just fine (and for free).

Note that the latest version of JODConverter available in the Maven Central Repository offers a feature, called Filters that would allow you to convert only the first page easily, and it supports conversion to PNG out of the box. Here's a quick example on how to do so:

// Create an office manager using the default configuration.
// The default port is 2002. Note that when an office manager
// is installed, it will be the one used by default when
// a converter is created.
final LocalOfficeManager officeManager = LocalOfficeManager.install(); 
try {

    // Start an office process and connect to the started instance (on port 2002).
    officeManager.start();

    final File inputFile = new File("document.docx");
    final File outputFile = new File("document.png");

    // Create a page selector filter in order to
    // convert only the first page.
    final PageSelectorFilter selectorFilter = new PageSelectorFilter(1);

    LocalConverter
      .builder()
      .filterChain(selectorFilter)
      .build()
      .convert(inputFile)
      .to(outputFile)
      .execute();
} finally {
    // Stop the office process
    LocalOfficeUtils.stopQuietly(officeManager);
}

As for your question

Can I set size of pdf to fit with converted document content

If you can do it using LibreOffice or Apache OpenOffice without JODConverter, then you can do it with JODConverter. You just have to find out how it can be done programmatically, and then create a filter to use with JODConverter.

I won't go in details here since you may choose another way but if you need further assistance, just ask on the Gitter Community of the project.

Monochromatic answered 29/11, 2017 at 13:47 Comment(3)
Hi, I'm trying to get a feel of the library by trial and error, and in the above sample code (which I originally got from your github wiki), I can't figure out what the OfficeUtils class is.Garnet
Sorry I didn't see your comment before just now. This post was before the new jodconverter-online module and missed this part when I updated the post. The class OfficeUtils has been renamed LocalOfficeUtils.Monochromatic
@Monochromatic does this only work for .docx/.odt files or also excel/pdf/powerpoint files too?Hyphen
O
3

You can try GroupDocs.Conversion Cloud SDK for Java, its free package plan provides 50 free credits per month. It supports conversion of all common file formats.

Sample DOCX to Image stream conversion code:

// Get App Key and App SID from https://dashboard.groupdocs.cloud/
ConvertApi apiInstance = new ConvertApi(AppSID,AppKey);
try {

    ConvertSettings settings = new ConvertSettings();

    settings.setStorageName(Utils.MYStorage);
    settings.setFilePath("conversions\\password-protected.docx");
    settings.setFormat("jpeg");

    DocxLoadOptions loadOptions = new DocxLoadOptions();
    loadOptions.setPassword("password");
    loadOptions.setHideWordTrackedChanges(true);
    loadOptions.setDefaultFont("Arial");

    settings.setLoadOptions(loadOptions);

    JpegConvertOptions convertOptions = new JpegConvertOptions();
    convertOptions.setFromPage(1);
    convertOptions.setPagesCount(1);
    convertOptions.setGrayscale(false);
    convertOptions.setHeight(1024);
    convertOptions.setQuality(100);
    convertOptions.setRotateAngle(90);
    convertOptions.setUsePdf(false);
    settings.setConvertOptions(convertOptions);

    // set OutputPath as empty will result the output as document IOStream
    settings.setOutputPath("");

    // convert to specified format
    File response = apiInstance.convertDocumentDownload(new ConvertDocumentRequest(settings));
    System.out.println("Document converted successfully: " + response.length());
} catch (ApiException e) {
    System.err.println("Exception while calling ConvertApi:");
    e.printStackTrace();
}

I am developer evangelist at Aspose.

Overeager answered 18/10, 2019 at 4:43 Comment(0)
H
0

Solution by @sbraconnier in newer versions, with direct in-memory handling:

import org.jodconverter.core.document.DefaultDocumentFormatRegistry;
import org.jodconverter.core.office.OfficeException;
import org.jodconverter.local.LocalConverter;
import org.jodconverter.local.office.LocalOfficeManager;
import org.jodconverter.local.filter.PagesSelectorFilter;

import java.io.ByteArrayOutputStream;
import java.io.InputStream;

public class Office {
    // Create an office manager using the default configuration.
    // The default port is 2002. Note that when an office manager
    // is installed, it will be the one used by default when
    // a converter is created.
    final public static LocalOfficeManager officeManager = LocalOfficeManager.install();
    static{
        // Start an office process and connect to the started instance (on port 2002).
        try {
            officeManager.start();
            Runtime.getRuntime().addShutdownHook(new Thread(() -> {
                try {
                    officeManager.stop();
                } catch (OfficeException e) {
                    //AL.warn(e);
                }
            }));
        } catch (OfficeException e) {
            //AL.warn(e);
        }
    }

    /**
     * @param inputFile document.docx
     * @return document.png preview image bytes.
     */
    public static byte[] createPreview(InputStream inputFile) throws OfficeException {
        final ByteArrayOutputStream outputFile = new ByteArrayOutputStream();

        // Create a page selector filter in order to
        // convert only the first page.
        final PagesSelectorFilter selectorFilter = new PagesSelectorFilter(1);

        LocalConverter
                .builder()
                .filterChain(selectorFilter)
                .build()
                .convert(inputFile)
                .to(outputFile)
                .as(DefaultDocumentFormatRegistry.PNG)
                .execute();
        return outputFile.toByteArray();
    }
}

Hyphen answered 19/5, 2024 at 10:27 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.