Convert docx file into PDF with Java
Asked Answered
T

1

7

I'am looking for some "stable" method to convert DOCX file from MS WORD into PDF. Since now I have used OpenOffice installed as listener but it often hangs. The problem is that we have situations when many users want to convert SXW,DOCX files into PDF at the same time. Is there some other possibility? I tryed with examples from this site: https://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/ but the output result is not good (converted documents have errors and layout is quite modified).

here is "source" docx document: enter image description here

here is document converted with docx4j with some exception text inside document. Also the text in upper right corner is missing.

enter image description here

this one is PDF created with OpenOffice as converter from docx to pdf. Some text is missing "upper right corner"

enter image description here

Is there some other option to convert docx into pdf with Java?

Tautomerism answered 13/12, 2016 at 10:19 Comment(4)
Not on SO; when you would be asking "to recommend a tool or library" - but why not just try to get you openoffice setup stable?Morose
You can use JODConverter (code.google.com/archive/p/jodconverter) or docx4j (docx4java.org/trac/docx4j)Danialah
JODConverter uses OpenOffice in background.. The problem is that OpenOffice sometimes hangs (crash) without any reason. I also tryed docx4j (look at my question)Tautomerism
That's a 4 year old article you reference there. These days, the recommended way to do it from docx4j is with Plutext's commercial PDF Converter. You can try that online at converter-eval.plutext.comNolasco
W
4

There are lot of methods to do conversion One of the used method is using POI and DOCX4j

InputStream is = new FileInputStream(new File("your Docx PAth"));
            WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
                    .load(is);
            List sections = wordMLPackage.getDocumentModel().getSections();
            for (int i = 0; i < sections.size(); i++) {
                wordMLPackage.getDocumentModel().getSections().get(i)
                        .getPageDimensions();
            }
            Mapper fontMapper = new IdentityPlusMapper();
            PhysicalFont font = PhysicalFonts.getPhysicalFonts().get(
                    "Comic Sans MS");//set your desired font 
            fontMapper.getFontMappings().put("Algerian", font);
            wordMLPackage.setFontMapper(fontMapper);
            PdfSettings pdfSettings = new PdfSettings();
            org.docx4j.convert.out.pdf.PdfConversion conversion = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(
                    wordMLPackage);
            //To turn off logger
            List<Logger> loggers = Collections.<Logger> list(LogManager
                    .getCurrentLoggers());
            loggers.add(LogManager.getRootLogger());
            for (Logger logger : loggers) {
                logger.setLevel(Level.OFF);
            }
            OutputStream out = new FileOutputStream(new File("Your OutPut PDF path"));
            conversion.output(out, pdfSettings);
            System.out.println("DONE!!"); 

This works perfect and even tried on multiple DOCX files.

Wallop answered 13/12, 2016 at 11:9 Comment(15)
Tryed with your method but stil get some exception: WARN org.apache.fop.image.loader.batik.PreloaderSVG .preloadImage line 76 - Batik not in class path java.lang.NoClassDefFoundError: org/apache/batik/bridge/UserAgent at org.apache.fop.image.loader.batik.PreloaderSVG.preloadImage(PreloaderSVG.java:69)Tautomerism
import org.apache.log4j.Level; import org.apache.log4j.LogManager; import org.apache.log4j.Logger; import org.docx4j.convert.out.pdf.viaXSLFO.PdfSettings; import org.docx4j.fonts.IdentityPlusMapper; import org.docx4j.fonts.Mapper; import org.docx4j.fonts.PhysicalFont; import org.docx4j.fonts.PhysicalFonts; import org.docx4j.openpackaging.packages.WordprocessingMLPackage;Wallop
still get the same malformed PDF as in docx4j... here is: s5.postimg.org/ptxrxtfyf/screenshot_1540.jpgTautomerism
//To turn off logger List<Logger> loggers = Collections.<Logger> list(LogManager .getCurrentLoggers()); loggers.add(LogManager.getRootLogger()); for (Logger logger : loggers) { logger.setLevel(Level.OFF); } This turns off those messagesWallop
Will try to remove log but text (upper right corner), footer etc is missing in PDF document...Tautomerism
Is it an originally created docx or converted . Please checkWallop
If possible provide the docx file .Wallop
It's a document created in MS WORD - Office professional 2013.. s5.postimg.org/63a55ovlz/screenshot_1541.jpg If you can try here is my document: drive.google.com/file/d/0B6Z9wNTXyUEeOUtFRVhZeWtnZ3M/…Tautomerism
Check all dependencies once and rebuild the project . IT works charm!! Thank youWallop
Can you please send me a link with all included libraries? I have download librarires from this site: angelozerr.wordpress.com/2012/12/06/…Tautomerism
Also if I download latest library from docx4java I can't find Class org.docx4j.convert.out.pdf.PdfConversionTautomerism
The code sample in this answer uses docx4j, not POI :-)Nolasco
In the most recent docx4j, the export via XSL FO is a separate library, so you'd need that jar and its dependencies. Or use our commercial PDF Converter I recommended in my other comment :-)Nolasco
HI JasonPlutext.. Have tryed your online converter but in generated PDF there is no image in the lower left corner... s5.postimg.org/k5w2ko0zr/screenshot_1542.jpg ant this is original document: s5.postimg.org/8utewau4n/screenshot_1543.jpg any idea?Tautomerism
Would need to see the source docx. Can you email it to me, or drag it to ndoc.it and paste the resulting link here?Nolasco

© 2022 - 2024 — McMap. All rights reserved.