Merge xml files with nested elements without external libraries
Asked Answered
P

3

19

I am trying to merge multiple XML files together using Python and no external libraries. The XML files have nested elements.

Sample File 1:

<root>
  <element1>textA</element1>
  <elements>
    <nested1>text now</nested1>
  </elements>
</root>

Sample File 2:

<root>
  <element2>textB</element2>
  <elements>
    <nested1>text after</nested1>
    <nested2>new text</nested2>
  </elements>
</root>

What I Want:

<root>
  <element1>textA</element1>    
  <element2>textB</element2>  
  <elements>
    <nested1>text after</nested1>
    <nested2>new text</nested2>
  </elements>  
</root>  

What I have tried:

From this answer.

from xml.etree import ElementTree as et
def combine_xml(files):
    first = None
    for filename in files:
        data = et.parse(filename).getroot()
        if first is None:
            first = data
        else:
            first.extend(data)
    if first is not None:
        return et.tostring(first)

What I Get:

<root>
  <element1>textA</element1>
  <elements>
    <nested1>text now</nested1>
  </elements>
  <element2>textB</element2>
  <elements>
    <nested1>text after</nested1>
    <nested2>new text</nested2>
  </elements>
</root>

I hope you can see and understand my problem. I am looking for a proper solution, any guidance would be wonderful.

To clarify the problem, using the current solution that I have, nested elements are not merged.

Passover answered 14/2, 2013 at 15:51 Comment(0)
E
31

What the code you posted is doing is combining all the elements regardless of whether or not an element with the same tag already exists. So you need to iterate over the elements and manually check and combine them the way you see fit, because it is not a standard way of handling XML files. I can't explain it better than code, so here it is, more or less commented:

from xml.etree import ElementTree as et

class XMLCombiner(object):
    def __init__(self, filenames):
        assert len(filenames) > 0, 'No filenames!'
        # save all the roots, in order, to be processed later
        self.roots = [et.parse(f).getroot() for f in filenames]

    def combine(self):
        for r in self.roots[1:]:
            # combine each element with the first one, and update that
            self.combine_element(self.roots[0], r)
        # return the string representation
        return et.tostring(self.roots[0])

    def combine_element(self, one, other):
        """
        This function recursively updates either the text or the children
        of an element if another element is found in `one`, or adds it
        from `other` if not found.
        """
        # Create a mapping from tag name to element, as that's what we are fltering with
        mapping = {el.tag: el for el in one}
        for el in other:
            if len(el) == 0:
                # Not nested
                try:
                    # Update the text
                    mapping[el.tag].text = el.text
                except KeyError:
                    # An element with this name is not in the mapping
                    mapping[el.tag] = el
                    # Add it
                    one.append(el)
            else:
                try:
                    # Recursively process the element, and update it in the same way
                    self.combine_element(mapping[el.tag], el)
                except KeyError:
                    # Not in the mapping
                    mapping[el.tag] = el
                    # Just add it
                    one.append(el)

if __name__ == '__main__':
    r = XMLCombiner(('sample1.xml', 'sample2.xml')).combine()
    print '-'*20
    print r
Eviscerate answered 14/2, 2013 at 16:23 Comment(4)
Works perfectly, thanks, I had just started writing my own code. :)Passover
Nice, thanks. We needed also to merge attributes. It can be done by adding one.attrib.update( other.attrib ) at the beggining of the combine_element and mapping[el.tag].attrib.update( el.attrib ) after replacing element text.Cancellation
Any suggestions as to why I am getting an invalid syntax error? mapping = {el.tag: el **for** el in one}. The error is pointing to the 'for' syntax. I'm running Python 2.6.6.Bassesalpes
@Adrian That error is because the {} generator is only supported on Python 2.7+. You should use dict((el.tag, el) for el in one) which is equivalent.Eviscerate
R
4

Thank you, but my problem was to merge by considering the attributes also. here is the code after my patch:

    import sys
    from xml.etree import ElementTree as et


    class hashabledict(dict):
        def __hash__(self):
            return hash(tuple(sorted(self.items())))


    class XMLCombiner(object):
        def __init__(self, filenames):
            assert len(filenames) > 0, 'No filenames!'
            # save all the roots, in order, to be processed later
            self.roots = [et.parse(f).getroot() for f in filenames]

        def combine(self):
            for r in self.roots[1:]:
                # combine each element with the first one, and update that
                self.combine_element(self.roots[0], r)
            # return the string representation
            return et.ElementTree(self.roots[0])

        def combine_element(self, one, other):
            """
            This function recursively updates either the text or the children
            of an element if another element is found in `one`, or adds it
            from `other` if not found.
            """
            # Create a mapping from tag name to element, as that's what we are fltering with
            mapping = {(el.tag, hashabledict(el.attrib)): el for el in one}
            for el in other:
                if len(el) == 0:
                    # Not nested
                    try:
                        # Update the text
                        mapping[(el.tag, hashabledict(el.attrib))].text = el.text
                    except KeyError:
                        # An element with this name is not in the mapping
                        mapping[(el.tag, hashabledict(el.attrib))] = el
                        # Add it
                        one.append(el)
                else:
                    try:
                        # Recursively process the element, and update it in the same way
                        self.combine_element(mapping[(el.tag, hashabledict(el.attrib))], el)
                    except KeyError:
                        # Not in the mapping
                        mapping[(el.tag, hashabledict(el.attrib))] = el
                        # Just add it
                        one.append(el)

if __name__ == '__main__':

    r = XMLCombiner(sys.argv[1:-1]).combine()
    print '-'*20
    print et.tostring(r.getroot())
    r.write(sys.argv[-1], encoding="iso-8859-1", xml_declaration=True)
Roseannroseanna answered 27/4, 2015 at 13:13 Comment(0)
A
1

Extending @jadkik94 's answer to create a utility method which doesn't change its argument and also updates the attributes:

Note the code works only in Py2 as copy() method of Element class is not yet supported in Py3.

def combine_xmltree_element(element_1, element_2):
    """
    Recursively combines the given two xmltree elements. Common properties will be overridden by values of those
    properties in element_2.
    
    :param element_1: A xml Element
    :type element_1: L{Element}
    
    :param element_2: A xml Element
    :type element_2: L{Element}
    
    :return: A xml element with properties combined.
    """

    if element_1 is None:
        return element_2.copy()

    if element_2 is None:
        return element_1.copy()

    if element_1.tag != element_2.tag:
        raise TypeError(
            "The two XMLtree elements of type {t1} and {t2} cannot be combined".format(
                t1=element_1.tag,
                t2=element_2.tag
            )
        )

    combined_element = Element(tag=element_1.tag, attrib=element_1.attrib)
    combined_element.attrib.update(element_2.attrib)

    # Create a mapping from tag name to child element
    element_1_child_mapping = {child.tag: child for child in element_1}
    element_2_child_mapping = {child.tag: child for child in element_2}

    for child in element_1:
        if child.tag not in element_2_child_mapping:
            combined_element.append(child.copy())

    for child in element_2:
        if child.tag not in element_1_child_mapping:
            combined_element.append(child.copy())

        else:
            if len(child) == 0:  # Leaf element
                combined_child = element_1_child_mapping[child.tag].copy()
                combined_child.text = child.text
                combined_child.attrib.update(child.attrib)

            else:
                # Recursively process the element, and update it in the same way
                combined_child = combine_xmltree_element(element_1_child_mapping[child.tag], child)

            combined_element.append(combined_child)

    return combined_element
 
Andromeda answered 28/8, 2020 at 14:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.