Compare XML ignoring order of child elements
Asked Answered
G

14

30

Does anybody know of a tool that will compare two XML documents. Belay that mocking… there’s more. I need something that will make sure each node in file 1 is also in file 2 regardless of order. I thought XML Spy would do it with the Ignore Order of Child Nodes option but it didn’t. The following would be considered the same:

<Node>
    <Child name="Alpha"/>
    <Child name="Beta"/>
    <Child name="Charlie"/>
</Node>

<Node>
    <Child name="Beta"/>
    <Child name="Charlie"/>
    <Child name="Alpha"/>
</Node>
Gussiegussman answered 19/11, 2009 at 22:31 Comment(0)
M
3

You might want to google for "XML diff tool", which will give you more than adequate results. One of them is OxygenXml, a tool I frequently use. You can also try Microsofts XML Diff and Patch Tool.

Good Luck.

Mama answered 19/11, 2009 at 22:42 Comment(4)
the google search did not yield any free promising executable downloads as I'd hoped. However, +1 on XML Diff and Patch Tool! It requires that you have Visual Studio to build it to get the .exe. For a nice visually formatted xml diff, build the /Samples/XmlDiffView project, then run XmlDiffView [-flags] 1.xml 2.xml visual-output.htmlGalipot
Does OxygenXML support features OP requests? How is it configured?Hendrika
OxygenXml ignore attribute order, but it seems like not ignoring child element order...Demy
At this moment this page is the top result for googling diff xml ignore order of child nodes.Conventual
P
15

I wrote a simple python tool for this called xmldiffs:

Compare two XML files, ignoring element and attribute order.

Usage: xmldiffs [OPTION] FILE1 FILE2

Any extra options are passed to the diff command.

Get it at https://github.com/joh/xmldiffs

Parrott answered 22/3, 2017 at 23:54 Comment(2)
Doesn't work for cases where the text nodes of elements are in different orders, e.g., <root> <element>A value</element> <element>Another value</element> </root>Realistic
@Realistic Oh, sounds like a bug... Could you please file the issue on GitHub?Parrott
G
14

With Beyond Compare you can use in the File Formats-Settings the XML Sort Conversion. With this option the XML children will be sorted before the diff.

A trial / portable version of Beyond Compare is available.

Gambrinus answered 15/12, 2016 at 12:41 Comment(1)
In BC 4.4.2 you can go Tools -> File Fomats...click on XML in bar on the left, then click the Conversion tab. There a dropdown at the top select XML Sort.Recency
M
3

You might want to google for "XML diff tool", which will give you more than adequate results. One of them is OxygenXml, a tool I frequently use. You can also try Microsofts XML Diff and Patch Tool.

Good Luck.

Mama answered 19/11, 2009 at 22:42 Comment(4)
the google search did not yield any free promising executable downloads as I'd hoped. However, +1 on XML Diff and Patch Tool! It requires that you have Visual Studio to build it to get the .exe. For a nice visually formatted xml diff, build the /Samples/XmlDiffView project, then run XmlDiffView [-flags] 1.xml 2.xml visual-output.htmlGalipot
Does OxygenXML support features OP requests? How is it configured?Hendrika
OxygenXml ignore attribute order, but it seems like not ignoring child element order...Demy
At this moment this page is the top result for googling diff xml ignore order of child nodes.Conventual
C
2

I'd use XMLUnit for this as it can cater for elements being in a different order.

Chiasmus answered 11/4, 2013 at 13:23 Comment(3)
I've had varying degrees of success with this approach (both XmlUnit 1 and 2). It frequently works, but it sometimes fails (for pairs of XML that are clearly identical except for sort order, to the eye).Candlestand
If they really are identical except for sort order that sounds like it could be a bug with it. Worth checking if there is an existing issue and reporting it if you can easily reproduce it.Chiasmus
Looks like ignoring element order isn't supported :( github.com/xmlunit/xmlunit/issues/44Hibernate
O
1

I had a similar need this evening, and couldn't find something that fit my requirements.

My workaround was to sort the two XML files I wanted to diff, sorting alphabetically by the element name. Once they were both in a consistent order, I could diff the two sorted files using a regular visual diff tool.

If this approach sounds useful to anyone else, I've shared the python script I wrote to do the sorting at http://dalelane.co.uk/blog/?p=3225

Oiler answered 6/10, 2014 at 2:12 Comment(0)
M
1

With C# You could do this and afterwards compare it with any diff tool.

public void Run()
{
    LoadSortAndSave(@".. first file ..");
    LoadSortAndSave(@".. second file ..");
}

public void LoadSortAndSave(String path)
{
    var xdoc = XDocument.Load(path);
    SortXml(xdoc.Root);
    File.WriteAllText(path + ".sorted", xdoc.ToString());
}

private void SortXml(XContainer parent)
{
    var elements = parent.Elements()
        .OrderBy(e => e.Name.LocalName)
        .ToArray();

    Array.ForEach(elements, e => e.Remove());

    foreach (var element in elements)
    {
        parent.Add(element);
        SortXml(element);
    }
}
Moffatt answered 10/9, 2018 at 13:53 Comment(0)
K
0

i recently gave a similar answer here (Open source command line tool for Linux to diff XML files ignoring element order), but i'll provide more detail...

if you write a program to walk the two trees together, you can customize the logic for identifying "matches" between the trees, and also for handling nodes that don't match. here is an example in xslt 2.0 (sorry it's so long):

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"

                xmlns:set="http://exslt.org/sets"

                xmlns:primary="primary"
                xmlns:control="control"

                xmlns:util="util"

                exclude-result-prefixes="xsl xs set primary control">

    <!-- xml diff tool

         import this stylesheet from another and call the "compare" template with two args:

             primary: the root of the primary tree to submit to comparison
             control: the root of the control tree to compare against

         the two trees will be walked together. the primary tree will be walked in document order, matching elements
         and attributes from the control tree along the way, building a tree of common content, with appendages
         containing primary and control only content. that tree will then be used to generate the diff.

         the process of matching involves finding, for an element or attribute in the primary tree, the
         equivalent element or attribute in the control tree, *at the same level*, and *regardless of ordering*.

             matching logic is encoded as templates with mode="find-match", providing a hook to wire in specific
             matching logic for particular elements or attributes. for example, an element may "match" based on an
             @id attribute value, irrespective of element ordering; encode this in a mode="find-match" template.

             the treatment of diffs is encoded as templates with mode="primary-only" and "control-only", providing
             hooks for alternate behavior upon encountering differences.

          -->

    <xsl:output method="text"/>

    <xsl:strip-space elements="*"/>

    <xsl:param name="full" select="false()"/><!-- whether to render the full doc, as opposed to just diffs -->

    <xsl:template match="/">
        <xsl:call-template name="compare">
            <xsl:with-param name="primary" select="*/*[1]"/><!-- first child of root element, for example -->
            <xsl:with-param name="control" select="*/*[2]"/><!-- second child of root element, for example -->
        </xsl:call-template>
    </xsl:template>

    <!-- OVERRIDES: templates that can be overridden to provide targeted matching logic and diff treatment -->

    <!-- default find-match template for elements
         (by default, for "complex" elements, name has to match, for "simple" elements, name and value do)
         for context node (from "primary"), choose from among $candidates (from "control") which one matches
         (override with more specific match patterns to effect alternate behavior for targeted elements)
         -->
    <xsl:template match="*" mode="find-match" as="element()?">
        <xsl:param name="candidates" as="element()*"/>
        <xsl:choose>
            <xsl:when test="text() and count(node()) = 1"><!-- simple content -->
                <xsl:sequence select="$candidates[node-name(.) = node-name(current())][text() and count(node()) = 1][. = current()][1]"/>
            </xsl:when>
            <xsl:when test="not(node())"><!-- empty -->
                <xsl:sequence select="$candidates[node-name(.) = node-name(current())][not(node())][1]"/>
            </xsl:when>
            <xsl:otherwise><!-- presumably complex content -->
                <xsl:sequence select="$candidates[node-name(.) = node-name(current())][1]"/>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <!-- default find-match template for attributes
         (by default, name and value have to match)
         for context attr (from "primary"), choose from among $candidates (from "control") which one matches
         (override with more specific match patterns to effect alternate behavior for targeted attributes)
         -->
    <xsl:template match="@*" mode="find-match" as="attribute()?">
        <xsl:param name="candidates" as="attribute()*"/>
        <xsl:sequence select="$candidates[. = current()][node-name(.) = node-name(current())][1]"/>
    </xsl:template>

    <!-- default primary-only template (override with more specific match patterns to effect alternate behavior) -->
    <xsl:template match="@* | *" mode="primary-only">
        <xsl:apply-templates select="." mode="illegal-primary-only"/>
    </xsl:template>

    <!-- write out a primary-only diff -->
    <xsl:template match="@* | *" mode="illegal-primary-only">
        <primary:only>
            <xsl:copy-of select="."/>
        </primary:only>
    </xsl:template>

    <!-- default control-only template (override with more specific match patterns to effect alternate behavior) -->
    <xsl:template match="@* | *" mode="control-only">
        <xsl:apply-templates select="." mode="illegal-control-only"/>
    </xsl:template>

    <!-- write out a control-only diff -->
    <xsl:template match="@* | *" mode="illegal-control-only">
        <control:only>
            <xsl:copy-of select="."/>
        </control:only>
    </xsl:template>

    <!-- end OVERRIDES -->

    <!-- MACHINERY: for walking the primary and control trees together, finding matches and recursing -->

    <!-- compare "primary" and "control" trees (this is the root of comparison, so CALL THIS ONE !) -->
    <xsl:template name="compare">
        <xsl:param name="primary"/>
        <xsl:param name="control"/>

        <!-- write the xml diff into a variable -->
        <xsl:variable name="diff">
            <xsl:call-template name="match-children">
                <xsl:with-param name="primary" select="$primary"/>
                <xsl:with-param name="control" select="$control"/>
            </xsl:call-template>
        </xsl:variable>

        <!-- "print" the xml diff as textual output -->
        <xsl:apply-templates select="$diff" mode="print">
            <xsl:with-param name="render" select="$full"/>
        </xsl:apply-templates>

    </xsl:template>

    <!-- assume primary (context) element and control element match, so render the "common" element and recurse -->
    <xsl:template match="*" mode="common">
        <xsl:param name="control"/>

        <xsl:copy>
            <xsl:call-template name="match-attributes">
                <xsl:with-param name="primary" select="@*"/>
                <xsl:with-param name="control" select="$control/@*"/>
            </xsl:call-template>

            <xsl:choose>
                <xsl:when test="text() and count(node()) = 1">
                    <xsl:value-of select="."/>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:call-template name="match-children">
                        <xsl:with-param name="primary" select="*"/>
                        <xsl:with-param name="control" select="$control/*"/>
                    </xsl:call-template>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:copy>

    </xsl:template>

    <!-- find matches between collections of attributes in primary vs control -->
    <xsl:template name="match-attributes">
        <xsl:param name="primary" as="attribute()*"/>
        <xsl:param name="control" as="attribute()*"/>
        <xsl:param name="primaryCollecting" as="attribute()*"/>

        <xsl:choose>
            <xsl:when test="$primary and $control">
                <xsl:variable name="this" select="$primary[1]"/>
                <xsl:variable name="match" as="attribute()?">
                    <xsl:apply-templates select="$this" mode="find-match">
                        <xsl:with-param name="candidates" select="$control"/>
                    </xsl:apply-templates>
                </xsl:variable>

                <xsl:choose>
                    <xsl:when test="$match">
                        <xsl:copy-of select="$this"/>
                        <xsl:call-template name="match-attributes">
                            <xsl:with-param name="primary" select="subsequence($primary, 2)"/>
                            <xsl:with-param name="control" select="remove($control, 1 + count(set:leading($control, $match)))"/>
                            <xsl:with-param name="primaryCollecting" select="$primaryCollecting"/>
                        </xsl:call-template>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:call-template name="match-attributes">
                            <xsl:with-param name="primary" select="subsequence($primary, 2)"/>
                            <xsl:with-param name="control" select="$control"/>
                            <xsl:with-param name="primaryCollecting" select="$primaryCollecting | $this"/>
                        </xsl:call-template>
                    </xsl:otherwise>
                </xsl:choose>

            </xsl:when>
            <xsl:otherwise>
                <xsl:if test="$primaryCollecting | $primary">
                    <xsl:apply-templates select="$primaryCollecting | $primary" mode="primary-only"/>
                </xsl:if>
                <xsl:if test="$control">
                    <xsl:apply-templates select="$control" mode="control-only"/>
                </xsl:if>
            </xsl:otherwise>
        </xsl:choose>

    </xsl:template>

    <!-- find matches between collections of elements in primary vs control -->
    <xsl:template name="match-children">
        <xsl:param name="primary" as="node()*"/>
        <xsl:param name="control" as="element()*"/>

        <xsl:variable name="this" select="$primary[1]" as="node()?"/>

        <xsl:choose>
            <xsl:when test="$primary and $control">
                <xsl:variable name="match" as="element()?">
                    <xsl:apply-templates select="$this" mode="find-match">
                        <xsl:with-param name="candidates" select="$control"/>
                    </xsl:apply-templates>
                </xsl:variable>

                <xsl:choose>
                    <xsl:when test="$match">
                        <xsl:apply-templates select="$this" mode="common">
                            <xsl:with-param name="control" select="$match"/>
                        </xsl:apply-templates>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:apply-templates select="$this" mode="primary-only"/>
                    </xsl:otherwise>
                </xsl:choose>
                <xsl:call-template name="match-children">
                    <xsl:with-param name="primary" select="subsequence($primary, 2)"/>
                    <xsl:with-param name="control" select="if (not($match)) then $control else remove($control, 1 + count(set:leading($control, $match)))"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:when test="$primary">
                <xsl:apply-templates select="$primary" mode="primary-only"/>
            </xsl:when>
            <xsl:when test="$control">
                <xsl:apply-templates select="$control" mode="control-only"/>
            </xsl:when>
        </xsl:choose>

    </xsl:template>

    <!-- end MACHINERY -->

    <!-- PRINTERS: print templates for writing out the diff -->

    <xsl:template match="*" mode="print">
        <xsl:param name="depth" select="-1"/>
        <xsl:param name="render" select="false()"/>
        <xsl:param name="lineLeader" select="' '"/>
        <xsl:param name="rest" as="element()*"/>

        <xsl:if test="$render or descendant::primary:* or descendant::control:*">

            <xsl:call-template name="whitespace">
                <xsl:with-param name="indent" select="$depth"/>
                <xsl:with-param name="leadChar" select="$lineLeader"/>
            </xsl:call-template>
            <xsl:text>&lt;</xsl:text>
            <xsl:value-of select="name(.)"/>

            <xsl:apply-templates select="@* | primary:*[@*] | control:*[@*]" mode="print">
                <xsl:with-param name="depth" select="$depth"/>
                <xsl:with-param name="render" select="$render"/>
                <xsl:with-param name="lineLeader" select="$lineLeader"/>
            </xsl:apply-templates>

            <xsl:choose>
                <xsl:when test="text() and count(node()) = 1"><!-- field element (just textual content) -->
                    <xsl:text>&gt;</xsl:text>
                    <xsl:value-of select="."/>
                    <xsl:text>&lt;/</xsl:text>
                    <xsl:value-of select="name(.)"/>
                    <xsl:text>&gt;</xsl:text>
                </xsl:when>
                <xsl:when test="count(node()) = 0"><!-- empty (self-closing) element -->
                    <xsl:text>/&gt;</xsl:text>
                </xsl:when>
                <xsl:otherwise><!-- complex content -->
                    <xsl:text>&gt;&#10;</xsl:text>
                    <xsl:apply-templates select="*[not(self::primary:* and @*) and not(self::control:* and @*)]" mode="print">
                        <xsl:with-param name="depth" select="$depth + 1"/>
                        <xsl:with-param name="render" select="$render"/>
                        <xsl:with-param name="lineLeader" select="$lineLeader"/>
                    </xsl:apply-templates>
                    <xsl:call-template name="whitespace">
                        <xsl:with-param name="indent" select="$depth"/>
                        <xsl:with-param name="leadChar" select="$lineLeader"/>
                    </xsl:call-template>
                    <xsl:text>&lt;/</xsl:text>
                    <xsl:value-of select="name(.)"/>
                    <xsl:text>&gt;</xsl:text>
                </xsl:otherwise>
            </xsl:choose>

            <xsl:text>&#10;</xsl:text>

        </xsl:if>

        <!-- write the rest of the elements, if any -->
        <xsl:apply-templates select="$rest" mode="print">
            <xsl:with-param name="depth" select="$depth"/>
            <xsl:with-param name="render" select="$render"/>
            <xsl:with-param name="lineLeader" select="$lineLeader"/>
            <xsl:with-param name="rest" select="()"/><!-- avoid implicit param pass to recursive call! -->
        </xsl:apply-templates>

    </xsl:template>

    <xsl:template match="@*" mode="print">
        <xsl:param name="depth" select="0"/>
        <xsl:param name="render" select="false()"/>
        <xsl:param name="lineLeader" select="' '"/>
        <xsl:param name="rest" as="attribute()*"/>

        <xsl:if test="$render">

            <xsl:text>&#10;</xsl:text>
            <xsl:call-template name="whitespace">
                <xsl:with-param name="indent" select="$depth + 3"/>
                <xsl:with-param name="leadChar" select="$lineLeader"/>
            </xsl:call-template>
            <xsl:value-of select="name(.)"/>
            <xsl:text>="</xsl:text>
            <xsl:value-of select="."/>
            <xsl:text>"</xsl:text>
        </xsl:if>

        <xsl:apply-templates select="$rest" mode="print">
            <xsl:with-param name="depth" select="$depth"/>
            <xsl:with-param name="render" select="$render"/>
            <xsl:with-param name="lineLeader" select="$lineLeader"/>
            <xsl:with-param name="rest" select="()"/><!-- avoid implicit param pass to recursive call! -->
        </xsl:apply-templates>

    </xsl:template>

    <xsl:template match="primary:* | control:*" mode="print">
        <xsl:param name="depth"/>

        <xsl:variable name="diffType" select="util:diff-type(.)"/>
        <xsl:variable name="primary" select="self::primary:*"/>
        <xsl:variable name="lineLeader" select="if ($primary) then '+' else '-'"/>

        <!-- only if this is the first in a sequence of control::* elements, since the rest are handled along with the first... -->
        <xsl:if test="util:diff-type(preceding-sibling::*[1]) != $diffType">
            <xsl:if test="@*">
                <xsl:text>&#10;</xsl:text>
            </xsl:if>
            <xsl:call-template name="diffspace">
                <xsl:with-param name="indent" select="if (@*) then $depth + 3 else $depth"/>
                <xsl:with-param name="primary" select="$primary"/>
            </xsl:call-template>
            <b><i>&lt;!-- ... --&gt;</i></b><!-- something to identify diff sections in output -->
            <xsl:if test="node()">
                <xsl:text>&#10;</xsl:text>
            </xsl:if>
            <xsl:variable name="rest" select="set:leading(following-sibling::*, following-sibling::*[util:diff-type(.) != $diffType])"/>
            <xsl:apply-templates select="@* | node()" mode="print">
                <xsl:with-param name="depth" select="$depth"/>
                <xsl:with-param name="render" select="true()"/>
                <xsl:with-param name="lineLeader" select="$lineLeader"/>
                <xsl:with-param name="rest" select="$rest/@* | $rest/*"/>
            </xsl:apply-templates>
        </xsl:if>
    </xsl:template>

    <xsl:template name="whitespace">
        <xsl:param name="indent" select="0" as="xs:integer"/>
        <xsl:param name="leadChar" select="' '"/>
        <xsl:choose>
            <xsl:when test="$indent > 0">
                <xsl:value-of select="$leadChar"/>
                <xsl:text> </xsl:text>
                <xsl:for-each select="0 to $indent - 1">
                    <xsl:text>  </xsl:text>
                </xsl:for-each>
            </xsl:when>
            <xsl:otherwise>
                <xsl:for-each select="0 to $indent">
                    <xsl:text>  </xsl:text>
                </xsl:for-each>
            </xsl:otherwise>
        </xsl:choose>
    </xsl:template>

    <xsl:template name="diffspace">
        <xsl:param name="indent" select="0" as="xs:integer"/>
        <xsl:param name="primary" select="false()"/>
        <xsl:for-each select="0 to $indent">
            <xsl:choose>
                <xsl:when test="$primary">
                    <xsl:text>++</xsl:text>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:text>--</xsl:text>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:for-each>
    </xsl:template>

    <!-- just an "enum" for deciding whether to group adjacent diffs -->
    <xsl:function name="util:diff-type" as="xs:integer">
        <xsl:param name="construct"/>
        <xsl:sequence select="if ($construct/self::primary:*[@*]) then 1 else
                              if ($construct/self::control:*[@*]) then 2 else
                              if ($construct/self::primary:*) then 3 else
                              if ($construct/self::control:*) then 4 else
                              if ($construct) then 5 else 0"/>
    </xsl:function>

    <!-- end PRINTERS -->

</xsl:stylesheet>

consider this example input, based on yours:

<test>
    <Node>
        <Child name="Alpha"/>
        <Child name="Beta"/>
        <Child name="Charlie"/>
    </Node>
    <Node>
        <Child name="Beta"/>
        <Child name="Charlie"/>
        <Child name="Alpha"/>
    </Node>
</test>

with the stylesheet as is, the following is the output when applied to the example:

<Node>
  <Child
++++++++<!-- ... -->
+       name="Alpha"
--------<!-- ... -->
-       name="Beta">
  </Child>
  <Child
++++++++<!-- ... -->
+       name="Beta"
--------<!-- ... -->
-       name="Charlie">
  </Child>
  <Child
++++++++<!-- ... -->
+       name="Charlie"
--------<!-- ... -->
-       name="Alpha">
  </Child>
</Node>

but, if you add this custom template:

<xsl:template match="Child" mode="find-match" as="element()?">
    <xsl:param name="candidates" as="element()*"/>
    <xsl:sequence select="$candidates[@name = current()/@name][1]"/>
</xsl:template>

which says to match a Child element based on its @name attribute, then you get no output (meaning there is no diff).

Kerouac answered 23/1, 2017 at 17:17 Comment(0)
E
0

Here is a diff solution using SWI-Prolog

:- use_module(library(xpath)).
load_trees(XmlRoot1, XmlRoot2) :-
    load_xml('./xml_source_1.xml', XmlRoot1, _),
    load_xml('./xml_source_2.xml', XmlRoot2, _).

find_differences(Reference, Root1, Root2) :-
    xpath(Root1, //'Child'(@name=Name), Node),
    not(xpath(Root2, //'Child'(@name=Name), Node)),
    writeln([Reference, Name, Node]).

diff :-
    load_trees(Root1, Root2),
    (find_differences('1', Root1, Root2) ; find_differences('2', Root2, Root1)).

Prolog will unify the Name variable to match nodes from file 1 and file 2. The unification on the Node variable does the "diff" detection.

Here's some sample output below:

% file 1 and file 2 have no differences 
?- diff.
false.

% "Alpha" was updated  in file 2
?- diff.
[1,Alpha,element(Child,[name=Alpha],[])]
[2,Alpha,element(Child,[name=Alpha,age=7],[])]
false.
Erhart answered 26/9, 2017 at 0:51 Comment(0)
S
0

Wrote a simple java program to do so. Stored two XML's being compared in a HashMap, with key as XPath of element(including text value of element) and value as number of occurrences of that element. then compared two HashMap's for both keyset and values.

/** * creates a map of elements with text values and no nested nodes.
* Here Key of the map is XPATH of element concatenated with the text value of element, value of the element is number of occurrences of that element.
* * @param xmlContent * @return * @throws ParserConfigurationException * @throws SAXException * @throws IOException */

private static Map<String, Long> getMapOfElementsOfXML(String xmlContent)

        throws ParserConfigurationException, SAXException, IOException {

    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

    dbf.setValidating(false);

    DocumentBuilder db = dbf.newDocumentBuilder();

    Document doc1 = db.parse(new ByteArrayInputStream(xmlContent.getBytes()));

    NodeList entries = doc1.getElementsByTagName("*");

    Map<String, Long> mapElements = new HashMap<>();

    for (int i = 0; i < entries.getLength(); i++) {

        Element element = (Element) entries.item(i);

        if (element.getChildNodes().getLength() == 1&&element.getTextContent()!=null) {

            final String elementWithXPathAndValue = getXPath(element.getParentNode())

                    + "/"

                    + element.getParentNode().getNodeName()

                    + "/"

                    + element.getTagName()

                    + "/"

                    + element.getTextContent();

            Long countValue = mapElements.get(elementWithXPathAndValue);

            if (countValue == null) {

                countValue = Long.valueOf(0l);

            } else {

                ++countValue;

            }

            mapElements.put(elementWithXPathAndValue, countValue);

        }

    }

    return mapElements;

}

static String getXPath(Node node) {

    Node parent = node.getParentNode();

    if (parent == null) {

        return "";

    }

    return getXPath(parent) + "/" + parent.getNodeName();

}

Complete program is here https://comparetwoxmlsignoringstanzaordering.blogspot.com/2018/12/java-program-to-compare-two-xmls.html

Shears answered 30/12, 2018 at 17:36 Comment(0)
C
0

You can use the 'pom sorter' plugin in Idea Intellij and use Intellij's own 'Compare Files' tool.

Marketplace link for the pom sorter plugin: https://plugins.jetbrains.com/plugin/7084-pom-sorter

Claiborn answered 8/4, 2021 at 22:14 Comment(0)
C
0

I have recently faced an issue comparing 2 xmls using org.springframework.test.util.XmlExpectationsHelper#assertXmlEqual(String, String) in unit tests. According to the source code of org.springframework.test.util.XmlExpectationsHelper

    private static class XmlUnitDiff {

    private final Diff diff;


    XmlUnitDiff(String expected, String actual) {
        this.diff = DiffBuilder.compare(expected).withTest(actual)
                .withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndText))
                .ignoreWhitespace().ignoreComments()
                .checkForSimilar()
                .build();
    }


    public boolean hasDifferences() {
        return this.diff.hasDifferences();
    }

    @Override
    public String toString() {
        return this.diff.toString();
    }

}

it uses org.xmlunit:xmlunit-core.

Here is example of xmls that lead to error

1st

<Test1 id = "1">
   <Test2 test="true" id="1"/>
   <Test2 test="false" id="2"/>
</Test1>

2nd

<Test1 id = "1">
   <Test2 id="2" test="false"/>
   <Test2 id="1" test="true"/>
</Test1>

Error

Expected attribute value '1' but was '2' - comparing <Test2 id="1"...> at /Test1[1]/Test2[1]/@id to <Test2 id="2"...> at /Test1[1]/Test2[1]/@id

org.springframework.test.util.XmlExpectationsHelper uses ElementSelectors.byNameAndText.

xmlunit is looking for matching nodes using ElementSelector#canBeCompared(Element, Element). If ElementSelector#canBeCompared(Element, Element) == true it compares nodes.

As soon as there is no any text content in provided xmls, xmlunit considers that 1st item <Test2 test="true" id="1"/> can be compared with <Test2 id="2" test="false"/>.

And an error happens.

So, attributes should also be taken into account. xmlunit has another selector ElementSelectors.byNameAndAllAttributes for this case.

However, changing the code to

    DiffBuilder.compare(xml1).withTest(xml2)
        .withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndAllAttributes))
        .checkForSimilar().ignoreComments()
        .ignoreWhitespace()
        .build();

will lead to error for such xmls

1st

<Test1>
    <Test3>
        <Test4>data 1</Test4>
        <Test4>data 2</Test4>
    </Test3>
    <Test2>test data 2</Test2>
</Test1>

2nd

<Test1>
    <Test2>test data 2</Test2>
    <Test3>
        <Test4>data 2</Test4>
        <Test4>data 1</Test4>
    </Test3>
</Test1>

Error

Expected text value 'data 1' but was 'data 2' - comparing <Test4 ...>data 1</Test4> at /Test1[1]/Test3[1]/Test4[1]/text()[1] to <Test4 ...>data 2</Test4> at /Test1[1]/Test3[1]/Test4[1]/text()[1]

At this time there are no any attributes, and xmlunit tries to compare <Test4>data 1</Test4> and <Test4>data 2</Test4>

Finally, combining 2 selectors

import org.xmlunit.builder.DiffBuilder;
import org.xmlunit.diff.DefaultNodeMatcher;
import org.xmlunit.diff.ElementSelector;
import org.xmlunit.diff.ElementSelectors;

import java.util.StringJoiner;

public class Main {
    public static void compareXml(final String xml1, final String xml2) {
        final var diff = DiffBuilder
                .compare(xml1)
                .withTest(xml2)
                .withNodeMatcher(
                        new DefaultNodeMatcher(
                                (ElementSelector) (expected, actual) ->
                                        ElementSelectors.byNameAndText.canBeCompared(expected, actual) &&
                                                ElementSelectors.byNameAndAllAttributes.canBeCompared(expected, actual)
                        )
                )
                .checkForSimilar()
                .ignoreComments()
                .ignoreWhitespace()
                .build();

        if (diff.hasDifferences()) {
            throw new AssertionError(
                    new StringJoiner(System.lineSeparator())
                            .add(diff.toString())
                            .add(xml1)
                            .add(xml2)
                            .toString()
            );
        }
    }

    public static void main(String[] args) {
        //org.xmlunit.diff.ElementSelectors.byNameAndText gives error for identical xmls
        compareXml(
                """
                        <Test1 id = "1">
                            <Test2 test="true" id="1"/>
                            <Test2 test="false" id="2"/>
                        </Test1>""",
                """
                        <Test1 id = "1">
                            <Test2 id="2" test="false"/>
                            <Test2 id="1" test="true"/>
                        </Test1>"""
        );
        //org.xmlunit.diff.ElementSelectors.byNameAndAllAttributes gives error for identical xmls
        compareXml(
                """
                        <Test1>
                            <Test3>
                                <Test4>data 1</Test4>
                                <Test4>data 2</Test4>
                            </Test3>
                            <Test2>test data 2</Test2>
                        </Test1>""",
                """
                        <Test1>
                            <Test2>test data 2</Test2>
                            <Test3>
                                <Test4>data 2</Test4>
                                <Test4>data 1</Test4>
                            </Test3>
                        </Test1>"""
        );
    }
}

This code worked fine for all cases I had. Check nulls as well if nulls are possible.

By the way, if you try

        .withNodeMatcher(
                new DefaultNodeMatcher(
                        ElementSelectors.byNameAndText, ElementSelectors.byNameAndAllAttributes
                )
        )

it will not work as well. You can go deeper in DefaultNodeMatcher to find out what's the point.

I ran all the cases for 2.9.1 version of org.xmlunit:xmlunit-core as well. It's latest now.

Courtmartial answered 31/1 at 16:35 Comment(0)
W
-1
/**
     * @author sdiallo
     * @since 2017-01-16
     * <p>
     * Compare the content of two XML file
     * </p>
     * <ul>
     * <li>Ignore the white space</li>
     * <li>Ignore the attribute order</li>
     * <li>Ignore the comment</li>
     * <li>Ignore Sequence child nodes are not the same</li>
     * <ul>
     * 
     * @param String XML
     *            first Content to be compared
     * @param String XML
     *            second Content to be compared
     * @return List the differences computed between the two files
     *         <ul>
     *         <li>null means the files are equal</li>         
     *         <li>elsewhere the files are different</li>
     *         <ul>
     * */
    public static List buildDiffXMLs(String xmlExpected, String xmlGenerated) {
        List<?> differencesList = null;

    XMLUnit.setIgnoreAttributeOrder(true);
    XMLUnit.setIgnoreComments(true);
    XMLUnit.setIgnoreWhitespace(true);

    try {
        DetailedDiff diff = new DetailedDiff(XMLUnit.compareXML(
                xmlExpected, xmlGenerated));

        // Two documents are considered to be "similar" if they contain the
        // same elements and attributes regardless of order.
        if ( !diff.identical() && !diff.similar()) {
            differencesList = diff.getAllDifferences();
        }// end if

    } catch (SAXException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

    return differencesList;
}// buildDiffXMLs
Wisent answered 16/1, 2017 at 12:12 Comment(1)
Hint: please spent the 1 minute it takes to properly format all of your code; and maybe put an initial explanation there.Impropriety
B
-2

As a (very) quick and dirty approach, I've done this in a pinch:

  1. Open Excel
  2. Paste file 1 into column A, one line per row. Name the range "FILE1"
  3. Paste file 2 into column B, one line per row. Name the range "FILE2"
  4. In C1, enter the formula:

    =IF(ISERROR(VLOOKUP(B1,FILE1,1,FALSE)),"DIFF","")
    
  5. In D1, enter the forumula:

    =IF(ISERROR(VLOOKUP(A1,FILE2,1,FALSE)),"DIFF","")
    
  6. Fill down columns C and D to the bottom of the files.

That will highlight any rows which appear in one file but not the other file. It's not tidy by any stretch, but sometimes you just have to work with what you've got.

Beilul answered 21/1, 2015 at 8:40 Comment(0)
M
-2

The simple way to do so is to use versioning tool like tortoise git.

  1. Create a github account
  2. Create a git repository in your git account
  3. Checkout that repository
  4. Add the other side of the file to be compared
  5. Push the content to the server
  6. Change the source with the remain side
  7. Compare your content as any source file
Militarist answered 6/1, 2021 at 15:52 Comment(1)
This may answer how to compare files in general, but not with the additional requirement of ignoring the order of the elements and other things like attribute order and closing tag style. disclaimer: I'm not a downvoter, if that wasn't obvious. I'm just doing what the downvoters should have done and left constructive feedback.Zigzagger

© 2022 - 2024 — McMap. All rights reserved.