programmatically comparing word documents
Asked Answered
R

7

9

I need to compare two office documents, in this case two word documents and provide a difference, which is somewhat similar to what is show in SVN. Not to that extent, but at least be able to highlight the differences.

I tried using the office COM dll and got this far..

object fileToOpen = (object)@"D:\doc1.docx";
string fileToCompare = @"D:\doc2.docx";

WRD.Application WA = new WRD.Application();

Document wordDoc = null;

wordDoc = WA.Documents.Open(ref fileToOpen, Type.Missing, Type.Missing, Type.Missing, Type.Missing,      Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
wordDoc.Compare(fileToCompare, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);

Any tips on how to proceed further? This will be a web application having a lot of hits. Is using the office com object the right way to go, or are there any other things I can look at?

Rung answered 23/11, 2011 at 15:37 Comment(5)
Just of interest, how SVN show difefrence between two binary files? (AFAIK docx is a zip archive format)Supercilious
select the two files in question, usually on the same folder in the client side. You have tortoiseSVN installed. You right click and go to the TortoiseSVN menu and select Diff...Rung
Yep I know how to do it but which difference you will see, does it makes any sense?Supercilious
I'm open to a better way of comparing the two documents in a more sensible manner. Can you suggest one?Rung
Also see #12321990Whereby
F
1

I agree w/ Joseph about diff'ing the string. I would also recommend a purpose-built diffing engine (several found here: Any decent text diff/merge engine for .NET?) which can help you avoid some of the normal pitfalls in diffing.

Finish answered 23/11, 2011 at 15:45 Comment(0)
O
4

You should use Document class to compare files and open in a Word document the result.

using OfficeWord = Microsoft.Office.Interop.Word;

object fileToOpen = (object)@"D:\doc1.docx";
string fileToCompare = @"D:\doc2.docx";

var app = Global.OfficeFile.WordApp;

object readOnly = false;
object AddToRecent = false;
object Visible = false;

OfficeWord.Document docZero = app.Documents.Open(fileToOpen, ref missing, ref readOnly, ref AddToRecent, Visible: ref Visible);

docZero.Final = false;
docZero.TrackRevisions = true;
docZero.ShowRevisions = true;
docZero.PrintRevisions = true;

//the OfficeWord.WdCompareTargetNew defines a new file, you can change this valid value to change how word will open the document
docZero.Compare(fileToCompare, missing, OfficeWord.WdCompareTarget.wdCompareTargetNew, true, false, false, false, false);
Overcompensation answered 10/11, 2014 at 12:29 Comment(4)
Hi @anderson-rissardi! What does the Compare method actually do? Does it open some file somewhere? Because I'm not seeing anything when I run this in my unit test. How am I supposed to get the result since the method returns void?Kynthia
Hi @ditoslav. It opens a new file. It is the 'Copare' button inside the Word. Open the MS Word -> Tab 'Review' -> Button 'Compare'. Is the same functionality, a new document it is generate. You must to do a save of this new document.Overcompensation
Where did Global.OfficeFile.WordApp go? Using VS 2019 it is apparently no longer part of Office.Interop.Word.Maxentia
@Maxentia Global.OfficeFile.WordApp its an internal variable. You should use the Microsoft.Office.Interop.Word.Application of your appOvercompensation
K
3

So my requirements were that I had to use a .Net lib and I wanted to avoid working on actual files but work with streams.

ZipArchive is in System.IO.Compressed

What I did and it worked out quite nicely was using the ZipArchive from .Net and comparing contents while skipping the .rels file because it seems the it is randomly generated on each file creation. Here's my snippet:

    private static bool AreWordFilesSame(byte[] wordA, byte[] wordB)
    {
        using (var streamA = new MemoryStream(wordA))
        using (var streamB = new MemoryStream(wordB))
        using (var zipA = new ZipArchive(streamA))
        using (var zipB = new ZipArchive(streamB))
        {
            streamA.Seek(0, SeekOrigin.Begin);
            streamB.Seek(0, SeekOrigin.Begin);

            for(int i = 0; i < zipA.Entries.Count; ++i)
            {
                Assert.AreEqual(zipA.Entries[i].Name, zipB.Entries[i].Name);

                if (zipA.Entries[i].Name.EndsWith(".rels")) //These are some weird word files with autogenerated hashes
                {
                    continue;
                }

                var streamFromA = zipA.Entries[i].Open();
                var streamFromB = zipB.Entries[i].Open();

                using (var readerA = new StreamReader(streamFromA))
                using (var readerB = new StreamReader(streamFromB))
                {
                    var bytesA = readerA.ReadToEnd();
                    var bytesB = readerB.ReadToEnd();
                    if (bytesA != bytesB || bytesA.Length == 0)
                    {
                        return false;
                    }
                }
            }

            return true;
        }
    }
Kynthia answered 20/10, 2017 at 15:3 Comment(0)
F
1

I agree w/ Joseph about diff'ing the string. I would also recommend a purpose-built diffing engine (several found here: Any decent text diff/merge engine for .NET?) which can help you avoid some of the normal pitfalls in diffing.

Finish answered 23/11, 2011 at 15:45 Comment(0)
C
1

For a solution on a server, or running without an installation of Word and using the COM tools, you could use the WmlComparer component of XmlPowerTools.

The documentation is a bit limited, but here's an example usage:

var expected = File.ReadAllBytes(@"c:\expected.docx");
var actual = File.ReadAllBytes(@"c:\result.docx");
var expectedresult = new WmlDocument("expected.docx", expected);
var actualDocument = new WmlDocument("result.docx", actual);
var comparisonSettings = new WmlComparerSettings();

var comparisonResults = WmlComparer.Compare(expectedresult, actualDocument, comparisonSettings);
var revisions = WmlComparer.GetRevisions(comparisonResults, comparisonSettings);

which will show you the differences between the two documents.

Consignment answered 23/11, 2017 at 0:9 Comment(1)
do you know whether XmlPowerTools can generate a resulting document with the differences as "tracked changes"?Tuning
K
1

This function lets you compare two documents as well as two versions of a document in C#.

public async Task<object> compare()
        {
            Word.Application wordApp = new Word.Application();
            wordApp.Visible = false;
            object wordTrue = (object)true;
            object wordFalse = (object)false;
            object fileToOpen = @"Give your file path here";
            object missing = Type.Missing;
            Word.Document doc1 = wordApp.Documents.Open(ref fileToOpen,
                   ref missing, ref wordFalse, ref wordFalse, ref missing,
                   ref missing, ref missing, ref missing, ref missing,
                   ref missing, ref missing, ref wordTrue, ref missing,
                   ref missing, ref missing, ref missing);

            object fileToOpen1 = @"Give your file path here";
            Word.Document doc2 = wordApp.Documents.Open(ref fileToOpen1,
                   ref missing, ref wordFalse, ref wordFalse, ref missing,
                   ref missing, ref missing, ref missing, ref missing,
                   ref missing, ref missing, ref missing, ref missing,
            ref missing, ref missing, ref missing);

            Word.Document doc = wordApp.CompareDocuments(doc1, doc2, Word.WdCompareDestination.wdCompareDestinationNew,
                                Word.WdGranularity.wdGranularityWordLevel,
                                true, true, true, true, true, true, true, true, true, true, "", true);

            doc1.Close(ref missing, ref missing, ref missing);
            doc2.Close(ref missing, ref missing, ref missing);

            // This Hides both original and revised documents you can change it according to your use case.
            wordApp.ActiveWindow.ShowSourceDocuments = WdShowSourceDocuments.wdShowSourceDocumentsNone;

            wordApp.Visible = true;
            doc.Activate();

            return Ok("Compared Successfully");
        }
Kalakalaazar answered 18/1, 2023 at 6:41 Comment(0)
S
0

You should really be extracting the doc into a string and diff'ing that.

You only care about the textual changes and not the formatting right?

Snooty answered 23/11, 2011 at 15:43 Comment(1)
everything, even if the image is different. But I am going to try and relax that requirement.Rung
B
-1

To do a comparison between Word documents, you need

  1. A library to manipulate Word document, e.g. read paragraphs, text, tables etc from a Word file. You can try Office Interop, OpenXML or Aspose.Words for .NET.
  2. An algorithm/library to do the actual comparison, on the text retrieved from both Word documents. You can write your own or use a library like DiffMatchPatch or similar.

This question is old, now there are more solutions like GroupDocs Compare available.

Document Comparison by Aspose.Words for .NET is an open source showcase project that uses Aspose.Words and DiffMatchPatch for comparison.

I work at Aspose as a Developer Evangelist.

Brinna answered 9/3, 2015 at 10:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.