How would you compare two XML Documents?
Asked Answered
F

13

73

As part of the base class for some extensive unit testing, I am writing a helper function which recursively compares the nodes of one XmlDocument object to another in C# (.NET). Some requirements of this:

  • The first document is the source, e.g. what I want the XML document to look like. Thus the second is the one I want to find differences in and it must not contain extra nodes not in the first document.
  • Must throw an exception when too many significant differences are found, and it should be easily understood by a human glancing at the description.
  • Child element order is important, attributes can be in any order.
  • Some attributes are ignorable; specifically xsi:schemaLocation and xmlns:xsi, though I would like to be able to pass in which ones are.
  • Prefixes for namespaces must match in both attributes and elements.
  • Whitespace between elements is irrelevant.
  • Elements will either have child elements or InnerText, but not both.

While I'm scrapping something together: has anyone written such code and would it be possible to share it here?

On an aside, what would you call the first and second documents? I've been referring to them as "source" and "target", but it feels wrong since the source is what I want the target to look like, else I throw an exception.

Fairweather answered 3/10, 2008 at 17:19 Comment(7)
Can the nodes be the same but be declared in a different order?Kilohertz
No, the nodes have to be in the same order. Besides being a requirement of the documents themselves, it makes differencing a bit simpler (just enumerate children and check one-to-one).Fairweather
> attributes can be in any order Good thing, because attributes are unordered by definition.Deplume
I call the documents, "baseline" and "test".Hafner
possible duplicate of What is the best way to compare XML files for equality?Dumdum
#168446, #14341990, #14341990, #300053, #2924850Aglaia
Call them "actual" and "expected" (Yes, I know I'm 13 years too late).Parian
S
65

Microsoft has an XML diff API that you can use.

Unofficial NuGet: https://www.nuget.org/packages/XMLDiffPatch.

Sainfoin answered 3/10, 2008 at 17:22 Comment(5)
This is very cool! Unfortunately the ONE thing it doesn't do is allow me to ignore certain attributes.Fairweather
I forgot to mention in my post, one of the other things I did in the XSLT was to filter out certain attributes.Undermine
Another link for the tool is here msdn.microsoft.com/en-gb/library/aa302294.aspxPub
Worked well for me, but I did encounter bugs relating to comments (it ignored comments even though XmlDiffOptions.IgnoreComments wasn't specified)Aglaia
Note that XML Notepad (github.com/microsoft/XmlNotepad) has XMLDiff built in, which visually displays the differences encoded in the XMLDiff diffgram. Simply open an XML and then go to View -> Compare XML files. You can even control the XmlDiffOptions in the options.Aglaia
D
9

I googled up a more complete list of solutions of this problem today, I am going to try one of them soon:

Dumdum answered 30/8, 2014 at 20:27 Comment(3)
If you want to use external API call then I suggest you to look here [XML comparator][1]. This is very handy and you can use both API or there website UI to copy paste the content which gives you all the differences. Also to add you can ignore the order of elements or nodes by setting one flag they have. Hope this helps you! [1]: jsoftwarelabs.com/jslutils/xml-comparisonCaret
simplethread.com/checking-xml-for-semantic-equivalence-in-c does not work well. Eg. it does not work when you change order of the attributes.Hackworth
I finally chose netbike in my Net6 tests project. It looks well for my scenario.Hackworth
W
8

This code doesn't satisfy all your requirements, but it's simple and I'm using for my unit tests. Attribute order doesn't matter, but element order does. Element inner text is not compared. I also ignored case when comparing attributes, but you can easily remove that.

public bool XMLCompare(XElement primary, XElement secondary)
{
    if (primary.HasAttributes) {
        if (primary.Attributes().Count() != secondary.Attributes().Count())
            return false;
        foreach (XAttribute attr in primary.Attributes()) {
            if (secondary.Attribute(attr.Name.LocalName) == null)
                return false;
            if (attr.Value.ToLower() != secondary.Attribute(attr.Name.LocalName).Value.ToLower())
                return false;
        }
    }
    if (primary.HasElements) {
        if (primary.Elements().Count() != secondary.Elements().Count())
            return false;
        for (var i = 0; i <= primary.Elements().Count() - 1; i++) {
            if (XMLCompare(primary.Elements().Skip(i).Take(1).Single(), secondary.Elements().Skip(i).Take(1).Single()) == false)
                return false;
        }
    }
    return true;
}
Washy answered 5/5, 2015 at 14:57 Comment(4)
I can't get my Attributes to Count()Millwright
@Prof.Falkencontractbreached are you using the right sort of XElement? You should be using the ones from System.Xml.Linq.Eris
@GilesRoberts that could have been it. It was a while, but I remember getting confused by types having the same name.Millwright
For my purposes, I'm also interested in the values of the XML elements themselves. if (primary.Value != secondary.Value) { return false; }Eris
R
8

For comparing two XML outputs in automated testing I found XNode.DeepEquals.

Compares the values of two nodes, including the values of all descendant nodes.

Usage:

var xDoc1 = XDocument.Parse(xmlString1);
var xDoc2 = XDocument.Parse(xmlString2);

bool isSame = XNode.DeepEquals(xDoc1.Document, xDoc2.Document);
//Assert.IsTrue(isSame);

Reference: https://learn.microsoft.com/en-us/dotnet/api/system.xml.linq.xnode.deepequals?view=netcore-2.2

Ronnironnica answered 20/12, 2018 at 14:15 Comment(1)
Please note that this comparison has a bug on attribute order: github.com/dotnet/dotnet-api-docs/issues/830Foran
T
7

try XMLUnit. This library is available for both Java and .Net

Tiresias answered 23/11, 2011 at 13:45 Comment(1)
XMLUnit looks well but is targeting only for .Net framework.Hackworth
U
5

Comparing XML documents is complicated. Google for xmldiff (there's even a Microsoft solution) for some tools. I've solved this a couple of ways. I used XSLT to sort elements and attributes (because sometimes they would appear in a different order, and I didn't care about that), and filter out attributes I didn't want to compare, and then either used the XML::Diff or XML::SemanticDiff perl module, or pretty printed each document with every element and attribute on a separate line, and using Unix command line diff on the results.

Undermine answered 3/10, 2008 at 17:28 Comment(0)
F
5

https://github.com/CameronWills/FatAntelope Another alternative library to the Microsoft XML Diff API. It has a XML diffing algorithm to do an unordered comparison of two XML documents and produce an optimal matching.

It is a C# port of the X-Diff algorithm described here: http://pages.cs.wisc.edu/~yuanwang/xdiff.html

Disclaimer: I wrote it :)

Farahfarand answered 7/11, 2015 at 7:8 Comment(0)
A
4

Another way to do this would be -

  1. Get the contents of both files into two different strings.
  2. Transform the strings using an XSLT (which will just copy everything over to two new strings). This will ensure that all spaces outside the elements are removed. This will result it two new strings.
  3. Now, just compare the two strings with each other.

This won't give you the exact location of the difference, but if you just want to know if there is a difference, this is easy to do without any third party libraries.

Android answered 30/8, 2010 at 19:20 Comment(1)
This doesn't answer the particular question, but the concept is relevant to the problem raise in the question. My +1.Derbent
J
3

I am using ExamXML for comparing XML files. You can try it. The authors, A7Soft, also provide API for comparing XML files

Jourdan answered 16/11, 2009 at 14:20 Comment(0)
S
2

Not relevant for the OP since it currently ignores child order, but if you want a code only solution you can try XmlSpecificationCompare which I somewhat misguidedly developed.

Slipover answered 1/9, 2013 at 12:40 Comment(0)
D
1

All above answers are helpful but I tried XMLUnit which look's easy to use Nuget package to check difference between two XML files, here is C# sample code

public static bool CheckXMLDifference(string xmlInput, string xmlOutput)
    {
        Diff myDiff = DiffBuilder.Compare(Input.FromString(xmlInput))
            .WithTest(Input.FromString(xmlOutput))
            .CheckForSimilar().CheckForIdentical()
            .IgnoreComments()
            .IgnoreWhitespace().NormalizeWhitespace().Build();

        if(myDiff.Differences.Count() == 0)
        {
            // when there is no difference 
            // files are identical, return true;
            return true;
        }
        else
        {
            //return false when there is 1 or more difference in file
            return false;
        }

    }

If anyone want's to test it, I have also created online tool using it, you can take a look here

https://www.minify-beautify.com/online-xml-difference

Dubrovnik answered 9/3, 2021 at 15:8 Comment(0)
T
0

Based @Two Cents answer and using this link XMLSorting i have created my own XmlComparer

Compare XML program

private static bool compareXML(XmlNode node, XmlNode comparenode)
    {

        if (node.Value != comparenode.Value)
            return false;

            if (node.Attributes.Count>0)
            {
                foreach (XmlAttribute parentnodeattribute in node.Attributes)
                {
                    string parentattributename = parentnodeattribute.Name;
                    string parentattributevalue = parentnodeattribute.Value;
                    if (parentattributevalue != comparenode.Attributes[parentattributename].Value)
                    {
                        return false;
                    }

                }

            }

          if(node.HasChildNodes)
            {
            sortXML(comparenode);
            if (node.ChildNodes.Count != comparenode.ChildNodes.Count)
                return false;
            for(int i=0; i<node.ChildNodes.Count;i++)
                {

                string name = node.ChildNodes[i].LocalName;
                if (compareXML(node.ChildNodes[i], comparenode.ChildNodes[i]) == false)
                    return false;
                }

            }



        return true;
    }

Sort XML program

 private static void sortXML(XmlNode documentElement)
    {
        int i = 1;
        SortAttributes(documentElement.Attributes);
        SortElements(documentElement);
        foreach (XmlNode childNode in documentElement.ChildNodes)
        {
            sortXML(childNode);

        }
    }



  private static void SortElements(XmlNode rootNode)
    {



            for(int j = 0; j < rootNode.ChildNodes.Count; j++) {
                for (int i = 1; i < rootNode.ChildNodes.Count; i++)
                {
                    if (String.Compare(rootNode.ChildNodes[i].Name, rootNode.ChildNodes[1 - 1].Name) < 0)
                    {
                        rootNode.InsertBefore(rootNode.ChildNodes[i], rootNode.ChildNodes[i - 1]);

                    }


                }
            }
           // Console.WriteLine(j++);


    }
 private static void SortAttributes(XmlAttributeCollection attribCol)
    {
        if (attribCol == null)
            return;
        bool changed = true;
        while (changed)
        {
            changed = false;
            for (int i = 1; i < attribCol.Count; i++)
        {
                if (String.Compare(attribCol[i].Name, attribCol[i - 1].Name) < 0)
                {
                    //Replace
                    attribCol.InsertBefore(attribCol[i], attribCol[i - 1]);
                    changed = true;

                }
            }
        }
    }
Transistorize answered 8/9, 2017 at 12:55 Comment(3)
XSLT would be a faster way to sort xml. Also, why not sort both the docs instead of using sort in the loop?Overwind
@PankajJaju i know xslt is faster, but i don't have any knowledge of xslt programming, also i am sorting both the docs, i am calling root element of both the docs as first node of compareXML method ` compareXML(document1.rootnode, document2.rootnode);` and sorting each node of both the docsTransistorize
See my answer for an xslt 1.0 solution to this problem.Hatband
H
0

I solved this problem of xml comparison using XSLT 1.0 which can be used for comparing large xml files using an unordered tree comparison algorithm. https://github.com/sflynn1812/xslt-diff-turbo

Hatband answered 23/4, 2019 at 11:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.