Comparing Docx files using OOXML
Asked Answered
M

2

8

How can I read word-by-word (with styles) from a docx file. I want to compare two docx files word-by-word and based on the differences I have to write into another docx file (using c# and OOXML). I have tried achieving this by using DocumentFormat.OpenXml.Extensions.dll, OpenXMLdiff.dll and ICSharpCode.SharpZipLib.dll but nothing is giving me the option to read word-by-word(ICSharpCode.SharpZipLib does give word-by-word but it will not give style associated with that word).

Any help on this will be very useful.

Myer answered 16/2, 2010 at 9:40 Comment(0)
S
3

This MSDN article shows how to reliably retrieve the exact text of a document, paragraph by paragraph.

http://msdn.microsoft.com/en-us/library/ff686712.aspx

At the same time, you can determine the style for each paragraph. That is pretty easy. The following blog post shows how to retrieve the style and text for each paragraph:

http://blogs.msdn.com/b/ericwhite/archive/2009/02/16/finding-paragraphs-by-style-name-or-content-in-an-open-xml-word-processing-document.aspx

Comparing the two? It depends on your exact desired semantics. One approach would be to create an XML document that contains paragraphs and styles, then comparing the XML documents. The XML document might look something like this:

<Root>
  <Para>
    <Style>Normal</Style>
    <Text>This is the text of the paragraph.</Text>
  </Para>
  <Para>
    <Style>Heading1</Style>
    <Text>Overview of the Process</Text>
  </Para>
</Root>
Sandbag answered 15/3, 2011 at 5:53 Comment(0)
T
0

The easiest way is to just unzip the DOCX file using your favorite ZIP library and then compare the text files with a file IO library.

Testimonial answered 2/1, 2011 at 21:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.