Comparing XmlDocument for equality (content wise)
Asked Answered
T

6

21

If I want to compare the contents of a XMlDocument, is it just like this?

XmlDocument doc1 = GetDoc1();
XmlDocument doc2 = GetDoc2();

if(doc1 == doc2)
{

}

I am not checking if they are both the same object reference, but if the CONTENTS of the xml are the same.

Tridentum answered 27/5, 2010 at 19:46 Comment(0)
P
13

No. XmlDocument does not override the behavior of the Equals() method so, it is in fact just performing reference equality - which will fail in your example, unless the documents are actually the same object instance.

If you want to compare the contents (attributes, elements, commments, PIs, etc) of a document you will have to implement that logic yourself. Be warned: it's not trivial.

Depending on your exact scenario, you may be able to remove all non-essential whitespace from the document (which itself can be tricky) and them compare the resulting xml text. This is not perfect - it fails for documents that are semantically identical, but differ in things like how namespaces are used and declared, or whether certain values are escaped or not, the order of elements, and so on. As I said before, XML comparison is not trivial.

You also need to clearly define what it means for two XML documents to be "identical". Does element or attribute ordering matter? Does case (in text nodes) matter? Should you ignore superfluous CDATA sections? Do processing instructions count? What about fully qualified vs. partially qualified namespaces?

In any general purpose implementation, you're likely going to want to transform both documents into some canonical form (be it XML or some other representation) and then compare the canonicalized content.

Tools already exist that perform XML differencing, like Microsoft XML Diff/Patch, you may be able to leverage that to identify differences between two documents. To my knowledge that tool is not distributed in source form ... so to use it in an embedded application you would need to script the process (if you plan to use it, you should first verify that the licensing terms allow it's use and redistribution).

EDIT: Check out @Max Toro's answer if you're using .NET 3.5 SP1, as apparently there's an option in XLinq that may be helpful. Nice to know it exists.

Pockmark answered 27/5, 2010 at 19:48 Comment(0)
W
42

Try the DeepEquals method on the XLinq API.

XDocument doc1 = GetDoc1(); 
XDocument doc2 = GetDoc2(); 
 
if(XNode.DeepEquals(doc1, doc2)) 
{ 
 
} 

See also Equality Semantics of LINQ to XML Trees

Wimbush answered 27/5, 2010 at 19:58 Comment(2)
Very nice. I did not know this existed. It looks like it handles many of the cases that I describe.Pockmark
Nice - this means I didn't need to import a third party library to do this for me!Gunther
P
13

No. XmlDocument does not override the behavior of the Equals() method so, it is in fact just performing reference equality - which will fail in your example, unless the documents are actually the same object instance.

If you want to compare the contents (attributes, elements, commments, PIs, etc) of a document you will have to implement that logic yourself. Be warned: it's not trivial.

Depending on your exact scenario, you may be able to remove all non-essential whitespace from the document (which itself can be tricky) and them compare the resulting xml text. This is not perfect - it fails for documents that are semantically identical, but differ in things like how namespaces are used and declared, or whether certain values are escaped or not, the order of elements, and so on. As I said before, XML comparison is not trivial.

You also need to clearly define what it means for two XML documents to be "identical". Does element or attribute ordering matter? Does case (in text nodes) matter? Should you ignore superfluous CDATA sections? Do processing instructions count? What about fully qualified vs. partially qualified namespaces?

In any general purpose implementation, you're likely going to want to transform both documents into some canonical form (be it XML or some other representation) and then compare the canonicalized content.

Tools already exist that perform XML differencing, like Microsoft XML Diff/Patch, you may be able to leverage that to identify differences between two documents. To my knowledge that tool is not distributed in source form ... so to use it in an embedded application you would need to script the process (if you plan to use it, you should first verify that the licensing terms allow it's use and redistribution).

EDIT: Check out @Max Toro's answer if you're using .NET 3.5 SP1, as apparently there's an option in XLinq that may be helpful. Nice to know it exists.

Pockmark answered 27/5, 2010 at 19:48 Comment(0)
S
11

A simple way could be to compare OuterXml.

var a = new XmlDocument();
var b = new XmlDocument();

a.LoadXml("<root  foo='bar'  />");
b.LoadXml("<root foo='bar'/>");

Debug.Assert(a.OuterXml == b.OuterXml);
Saline answered 7/8, 2012 at 15:0 Comment(0)
V
2

I know how old this question is but I had to go through multiple sources to find the answer I was looking for. The following is using the XNode.DeepEquals but also ignores the attribute order. With the amount of work I had to do to come up with this answer 13 years after this question was first asked I figure someone else may find this answer helpful.

Using NUnit as my testing suite, you can pass in either either 2 XmlDocuments or an XmlDocument and string. This converts the XmlDocuments into XDocuments, sorts the attributes for every node then does the XNode.DeepEquals().

Assert.That(XmlHelperService.XMLCompare(xmlDoc, expectedStr), Is.EqualTo(true));
Assert.That(XmlHelperService.XMLCompare(xmlDoc, expectedXmlDoc), Is.EqualTo(true));

Code

public class XmlHelperService
{
    public static bool XMLCompare(XmlDocument primary, string secondaryStr)
    {
        XmlDocument secondary = new XmlDocument();
        secondary.LoadXml(secondaryStr);

        return XMLCompare(primary, secondary);
    }

    public static bool XMLCompare(XmlDocument primary, XmlDocument secondary)
    {
        return XNode.DeepEquals(NormalizeXElement(primary.ToXDocument().Root), NormalizeXElement(secondary.ToXDocument().Root));
    }

    private static XElement NormalizeXElement(XElement element)
    {
        return new XElement(element.Name,
            element.Attributes().OrderBy(x => x.Name.ToString()),
            element.Nodes().Select(n =>
            {
                XElement e = n as XElement;
                if (e != null)
                    return NormalizeXElement(e);
                return n;
            })
        );
    }
}

public static class DocumentExtensions
{
    public static XDocument ToXDocument(this XmlDocument xmlDocument)
    {
        using (var nodeReader = new XmlNodeReader(xmlDocument))
        {
            nodeReader.MoveToContent();
            return XDocument.Load(nodeReader);
        }
    }
}
Vela answered 14/4, 2023 at 20:3 Comment(0)
A
0

LBushkin is right, this is not trivial. Since XML is string data you could technically perform a hash of the contents and compare them, but that will be affected by things like whitespace.

You could perform a structured diff (also called 'XML diffgram') between the two documents and compare the results. This is how .NET datasets keep track of changes, for example.

Other than that you'd have to iterate through the DOM and compare elements, attributes and values to each other. If there's a schema involved then you would also have to take into account positions and so on.

Andri answered 27/5, 2010 at 19:54 Comment(0)
A
0

Often You want to compare XML strings ordered differently. This can be done easy with this code

class Testing
{
    [Test]
    public void Test()
    {
        Assert.AreEqual(
            "<root><a></a><b></b></root>".SortXml()
            , "<root><b></b><a></a></root>".SortXml());
    }
}

public static class XmlCompareExtension
{
    public static string SortXml(this string @this)
    {
        var xdoc = XDocument.Parse(@this);

        SortXml(xdoc);

        return xdoc.ToString();
    }

    private static void SortXml(XContainer parent)
    {
        var elements = parent.Elements()
            .OrderBy(e => e.Name.LocalName)
            .ToArray();

        Array.ForEach(elements, e => e.Remove());

        foreach (var element in elements)
        {
            parent.Add(element);
            SortXml(element);
        }
    }
}
Aftereffect answered 10/9, 2018 at 17:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.