Currently, I am working on Microsoft document : Word (doc, docx), Powerpoint (ppt, pptx), and Excel (xls, xlsx)
I would like to create the a preview image from it's first page.
Only PowerPoint document can be done by Apache-poi library.
But I cannot find the solution for other types.
I have got an idea to convert the document to pdf (1) and the convert to image (2) .
For step 2 (convert pdf to image), there are many free java libraries e.g. PDFBox. It work fine with my dummy pdf file
However, I have a problem in Step 1
In my document, it may contains text with several styles, tables, images, or objects. Sample image from first page of word document:
Which open source java library can do this task?
I have tried to implement with following libraries:
JODConverter - The output look fine, but it requires OpenOffice.
docx4j - I'm not sure whether it can work with non ooxml format (doc, xls, ppt) and it really free? Following is example code:
String inputWordPath = "C:\\Users\\test\\Desktop\\TestPDF\\Docx.docx";
String outputPDFPath = "C:\\Users\\test\\Desktop\\TestPDF\\OutDocx4j.pdf";
try {
InputStream is = new FileInputStream(new File(inputWordPath));
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(is);
Mapper fontMapper = new IdentityPlusMapper();
wordMLPackage.setFontMapper(fontMapper);
Docx4J.toPDF(wordMLPackage, new FileOutputStream(new File(outputPDFPath)));
} catch (Exception e) {
e.printStackTrace();
}
The output look ok but it contains "## Evaluation Use only ##" in generated pdf.
xdocreport - The generated pdf does not contain image.
String inputWordPath = "C:\\Users\\test\\Desktop\\TestPDF\\Docx.docx";
String outputPDFPath = "C:\\Users\\test\\Desktop\\TestPDF\\OutXDOCReport.pdf";
InputStream is = new FileInputStream(new File(inputWordPath));
XWPFDocument document = new XWPFDocument(is);
PdfOptions options = PdfOptions.create();
OutputStream out = new FileOutputStream(new File(outputPDFPath));
PdfConverter.getInstance().convert(document, out, options);
I can not find the suitable library for the task.
Do you have any suggestion?
Can I convert document (docx, doc, xlsx, xls) to image directly?
Is docx4j really free on conversion feature?
How to remove "## Evaluation Use only ##" from generated pdf (by docx4j)?
Can docx4j work with non ooxml document?
Can I convert only first page to pdf?
Can I set size of pdf to fit with converted document content?
Are there any library and example code to convert document to pdf or convert document to image?