xml merge two files using xsl?
Asked Answered
P

3

2

I need to merge two similar xml files, but only records which match on common tags, e.g.<type> in the following example:

file1.xml is

<node>
    <type>a</type>
    <name>joe</name>
</node>
<node>
    <type>b</type>
    <name>sam</name>
</node>

file2.xml is

<node>
    <type>a</type>
    <name>jill</name>
</node>

so that I have an output of

<node>
    <type>a</type>
    <name>jill</name>
    <name>joe</name>
</node>
<node>
    <type>b</type>
    <name>sam</name>
</node>

What are the basics of doing this, in xsl? Many thanks.

Paramatta answered 9/12, 2010 at 19:12 Comment(0)
E
5

This stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:key name="kElementByType" match="*[not(self::type)]" use="../type"/>
    <xsl:param name="pSource2" select="'file2.xml'"/>
    <xsl:variable name="vSource2" select="document($pSource2,/)"/>
    <xsl:template match="node()|@*" name="identity">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="type">
        <xsl:variable name="vCurrent" select="."/>
        <xsl:call-template name="identity"/>
        <xsl:for-each select="$vSource2">
            <xsl:apply-templates select="key('kElementByType',$vCurrent)"/>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

With this input (wellformed):

<root>
    <node>
        <type>a</type>
        <name>joe</name>
    </node>
    <node>
        <type>b</type>
        <name>sam</name>
    </node>
</root>

Output:

<root>
    <node>
        <type>a</type>
        <name>jill</name>
        <name>joe</name>
    </node>
    <node>
        <type>b</type>
        <name>sam</name>
    </node>
</root>
Ence answered 9/12, 2010 at 19:26 Comment(6)
Wow. I knew about using both name and match on a matching template, but didn't realize a real-life pattern of exploiting that fact. Many thanks for this in particular and +1 for the answer in common.Sapro
Many thanks, that works great, as-is on a test system. Now I need to try and apply it to my actual system, which is more complex, as I simplified it for the question. I have something tangible to work through now though, and can work out what the functions etc. are doing from the docs. One thing I'd love clarification on, as I haven't found it yet (an no doubt it's fundamental, as I've seen it used a lot!), but what is the "@" char doing, as in the "node()|@*"Paramatta
@debs: You're wellcome. About your question: @* is the abbreviation of attribute::* meaning any attribute.Ence
Having played around with the code a bit more, I'm coming round to the idea that I don't know what the [match="node()|@*"] statement is doing. I thought it was testing the node named "node", but suspect it is some sort of reserved word, as if I change the actual node names to, say, tnode, the xsl works without changing, yet breaks if I substitute tnode for node. I could do with a basic primer which explains all this, so if anyone has any links (or a book) they could recommend, it would be much appreciated.Paramatta
This explained some of the basics of identities etc. very clearly, for me: xmlplease.com/xsltidentityParamatta
@debs: node() would be expanded to child::node() is a node kind test meaning any node type in the child axe (element, comment, PI, text node, only document root doesn't get matched because it's not a child)Ence
P
1

I thought it worth adding some extra info I've learned while doing this, in case it's of use to any other beginners. I've changed my test code names so that they aren't potentially confused with some of the terms used in the xsl. I've no idea if it's the best or most efficient way of doing things, but it works (with a few caveats!).

I wanted to keep the "info" node, and the original code lost it. Coding a separate match template keeps it in the output. Also, the way I coded it, this node is only kept if it is in the input file (x1). If it's in the (x2) file, then it doesn't get kept. This has to be with the way I've written the iterations. Ideally, I'd like to keep it from either input file, but haven't worked out how to do that yet. Also, I'd like to have the option of passing the filename x2 as a parameter, via msxsl, rather than have it hard coded. There surely must be a way of doing this, but I haven't managed to track it down yet.

xsl file:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:key name="kElementByType" match="*[not(self::keynode)]" use="../keynode"/>
    <xsl:param name="pSource2" select="'x2.xml'"/>
    <xsl:variable name="vSource2" select="document($pSource2,/)"/>
    <xsl:template match="node()|@*" name="identity">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="info">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="keynode">
        <xsl:variable name="vCurrent" select="."/>
        <xsl:call-template name="identity"/>
        <xsl:for-each select="$vSource2">
            <xsl:apply-templates select="key('kElementByType',$vCurrent)"/>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

So, using the msxls command:

msxsl.exe x1.xml test.xsl -o out.xml

Gives the following results with the data below:

file x1.xml:

<root>
    <info>
        <id>147</id>
    </info>
    <nodetype>
        <keynode>annajon</keynode>
        <note>
        <source>source1</source>
        <name>Anna Jones</name>
        </note>
    </nodetype>
    <nodetype>
        <keynode>brucejon</keynode>
        <note>
        <source>source1</source>
        <name>Bruce Jones</name>
        </note>
    </nodetype>
</root>

file x2.xml:

<root>
    <nodetype>
        <keynode>annajon</keynode>
        <note>
        <source>source2</source>
        <name>Anna Jones</name>
        </note>
    </nodetype>
    <nodetype>
        <keynode>iangore</keynode>
        <note>
        <source>source2</source>
        <name>Ian Gore</name>
        </note>
    </nodetype>
</root>

out.xml:

<?xml version="1.0" encoding="UTF-16"?><root>
    <info>
        <id>147</id>
    </info>
    <nodetype>
        <keynode>annajon</keynode><note>
        <source>source2</source>
        <name>Anna Jones</name>
        </note>
        <note>
        <source>source1</source>
        <name>Anna Jones</name>
        </note>
    </nodetype>
    <nodetype>
        <keynode>brucejon</keynode>
        <note>
        <source>source1</source>
        <name>Bruce Jones</name>
        </note>
    </nodetype>
</root>
Paramatta answered 12/12, 2010 at 11:42 Comment(1)
paremeter passing is easy when you know how: [msxsl.exe x2.xml test.xsl -o out.xml pSource2="x1.xml"] substitutes x1.xml for x2.xml in the xsl file. The "select" statement sets the default if no parameter is set.Paramatta
M
0

One way is to pass second xml as a parameter,

Second easier way is to concatenate both xmls under the one root element to

<root>
    <node>
        <type>a</type>
        <name>joe</name>
    </node>
    <node>
        <type>b</type>
        <name>sam</name>
    </node>
    <node>
        <type>a</type>
        <name>jill</name>
    </node>
</root>

and then do merge it using 2

<xsl:template match="/root">
    <xsl:for-each select="node">
        <xsl:variable name="type" select="type"/>
        <node> 
           <type><xsl:value-of select="$type"/></type>
           <xsl:for-each select="../node[type=$type]">
              <name><xsl:value-of select"name"/></name>
           </xsl:for-each>
       </node>
    </xsl:for-each>
</xsl:template>
Mahout answered 9/12, 2010 at 19:32 Comment(2)
Chernyshow: You still need to iterate over the uniques types.Ence
Thanks for the suggestion, Max. I had considered just cutting and pasting the two files and then doing a sort/merge, but couldn't figure that out either: I'm very new to xsl, but do like it! Sorry, I can't post up either of the answers as I don't have the reputation atm.Paramatta

© 2022 - 2024 — McMap. All rights reserved.