Convert DOC file to DOCX with Java
Asked Answered
V

7

10

I need to use DOCX files (actually the XML contained in them) in a Java software I'm currently developing, but some people in my company still use the DOC format.

Do you know if there is a way to convert a DOC file to the DOCX format using Java ? I know it's possible using C#, but that's not an option

I googled it, but nothing came up...

Thanks

Vandyke answered 12/7, 2011 at 13:14 Comment(1)
Apache POI could might help you with this task..Wherewithal
C
3

You may try Aspose.Words for Java. It allows you to load a DOC file and save it as DOCX format. The code is very simple as shown below:

// Open a document.  
Document doc = new Document("input.doc"); 
// Save document. 
doc.save("output.docx");

Please see if this helps in your scenario.

Disclosure: I work as developer evangelist at Aspose.

Culinary answered 4/10, 2011 at 9:52 Comment(3)
Document doc = new Document(“MyDir” + “in.doc”); while using the above code in my java page, it shows “The constructor Document(String) is undefined” and using this code ------------> doc.save(“output.docx”); it shows like “The method save(String) is undefined for the type Document” Whether i have to import any jar file. If means, please list out the jar fileEricaericaceous
how can I try this? which dependency I need to use?Pendant
@Rajesh, please download & add Aspose.Words for Java library in your project to be able to convert DOC to DOCX and many other formats using Java. You can also use Aspose.Words for Java directly from a Maven based project. For any more details, please refer to Documentation. Disclosure: I work with Aspose as Developer Evangelist.Harve
M
2

Check out JODConverter to see if it fits the bill. I haven't personally used it.

Misha answered 12/7, 2011 at 13:43 Comment(1)
It apparently can convert MSOffice files to OpenOffice format, but not DOC to DOCX. Thank you anywayVandyke
G
1

Use newer versions of jars jodconverter-core-4.2.2.jar and jodconverter-local-4.2.2.jar

String inputFile = "*.doc";
String outputFile = "*.docx";

LocalOfficeManager localOfficeManager = LocalOfficeManager.builder()
            .install()
            .officeHome(getDefaultOfficeHome()) //your path to openoffice
            .build();

  try {
      localOfficeManager.start();
      final DocumentFormat format
              = DocumentFormat.builder()
                      .from(DefaultDocumentFormatRegistry.DOCX)
                      .build();

      LocalConverter
              .make()
              .convert(new FileInputStream(new File(inputFile)))
              .as(DefaultDocumentFormatRegistry.getFormatByMediaType("application/msword"))
              .to(new File(outputFile))
              .as(format)
              .execute();

  } catch (OfficeException ex) {
      Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
  } catch (FileNotFoundException ex) {
      Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
  } finally {
      OfficeUtils.stopQuietly(localOfficeManager);
  }
Gyniatrics answered 12/11, 2019 at 12:42 Comment(0)
H
0

JODConvertor calls OpenOffice/LibreOffice via a network protocol. It can therefore 'do anything you can do in OpenOffice'. This includes converting formats. But it only does as good a job as whatever version of OpenOffice you are running. I have some art in one of my docs, and it doesn't convert them as I hoped.

JODConvertor is no longer supported, according to the google code web site for v3.

To get JOD to do the job you need to do something like

private static void transformBinaryWordDocToDocX(File in, File out)
{
    OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
    DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
    docx.setStoreProperties(DocumentFamily.TEXT,
    Collections.singletonMap("FilterName", "MS Word 2007 XML"));

    converter.convert(in, out, docx);
}


private static void transformBinaryWordDocToW2003Xml(File in, File out)
{
    OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);;
    DocumentFormat w2003xml = new DocumentFormat("Microsoft Word 2003 XML", "xml", "text/xml");
    w2003xml.setInputFamily(DocumentFamily.TEXT);
    w2003xml.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "MS Word 2003 XML"));
    converter.convert(in, out, w2003xml);
}



private static OfficeManager officeManager;

@BeforeClass
public static void setupStatic() throws IOException {

          /*officeManager = new DefaultOfficeManagerConfiguration()
      .setOfficeHome("C:/Program Files/LibreOffice 3.6")
      .buildOfficeManager();
      */

    officeManager = new ExternalOfficeManagerConfiguration().setConnectOnStart(true).setPortNumber(8100).buildOfficeManager();


    officeManager.start();
}

@AfterClass
public static void shutdownStatic() throws IOException {

    officeManager.stop();
}

For this to work you need to be running LibreOffice as a networked server ( I could not get the 'run on demand' part of JODConvertor to work under windows with LO 3.6 very well )

Heddle answered 16/8, 2012 at 14:2 Comment(1)
does this code convert DOC to DOCX using OO on headless mode ?Ripple
T
0

I needed the same conversion ,after researching a lot found Jodconvertor can be useful in it , you can download the jar from https://code.google.com/p/jodconverter/downloads/list

Add jodconverter-core-3.0-beta-4-sources.jar file to your project lib

  //1) Create OfficeManger Object     
OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
                .setOfficeHome(new File("/opt/libreoffice4.4"))
                .buildOfficeManager();
        officeManager.start();
    // 2) Create JODConverter converter   
        OfficeDocumentConverter converter = new OfficeDocumentConverter(
                officeManager);
// 3)Create DocumentFormat for docx
DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
        docx.setStoreProperties(DocumentFamily.TEXT,
                Collections.singletonMap("FilterName", "MS Word 2007 XML"));
//4)Call convert funtion in converter object
converter.convert(new File("doc/AdvancedTable.doc"), new File(
                "docx/AdvancedTable.docx"), docx);
Truncation answered 29/4, 2015 at 12:42 Comment(0)
F
-1

To convert DOC file to HTML look at this (Convert Word doc to HTML programmatically in Java)

Use this: http://poi.apache.org/

Or use this :

XWPFDocument docx = new XWPFDocument(OPCPackage.openOrCreate(new File("hello.docx")));  
XWPFWordExtractor wx = new XWPFWordExtractor(docx);  
String text = wx.getText();  
System.out.println("text = "+text); 
Feticide answered 12/7, 2011 at 13:23 Comment(4)
if (DOCX != HTML && hasNoOtherAnswer()) doDownvote() - OK, you mentioned poi, no -1 from me.Depreciatory
t will provide you functionality to get data from DOC file and you can directly operate on it without converting it to DOCX.Feticide
I need to use DOCX files. There is a strong requirement, most likely because the software already can process DOCX but some input is still the old DOC format. So converting to HTML and adding a html parser doesn't look like an optionDepreciatory
Indeed... All my parsers are already set up for the DOCX format (with the Open XML behind). I do not have time to write the parsers for HTML or text file, hence my questionVandyke
D
-3
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;


import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;


public class TestCon {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub

        POIFSFileSystem fs = null;  
        Document document = new Document();

        try {  
            System.out.println("Starting the test");  
            fs = new POIFSFileSystem(new FileInputStream("C:/Users/312845/Desktop/a.doc"));  

            HWPFDocument doc = new HWPFDocument(fs);  
            WordExtractor we = new WordExtractor(doc);  

            OutputStream file = new FileOutputStream(new File("C:/Users/312845/Desktop/test.docx")); 

            System.out.println("Document testing completed");  
        } catch (Exception e) {  
            System.out.println("Exception during test");  
            e.printStackTrace();  
        } finally {  
            // close the document  
            document.close();  
        }  
    }  
}
Drug answered 8/5, 2019 at 5:28 Comment(1)
Code-only answers are not well accepted here. Please explain him what you did and how it answers his question.Kayleigh

© 2022 - 2024 — McMap. All rights reserved.