Removing child elements in XML using python
Asked Answered
A

1

2

Python 3.2.5 x64 ElementTree

I have data that I need to format using python. Essentially I have file with elements and subelements. I need to delete the child elements of some of these elements. I have checked previous questions and I couldn't make a solution. The best I had so far only removes every second child element.

Sample data:

<Leg1:MOR oCount="7" xmlns:Leg1="http://what.not">
    <Leg1:Order>
        <Leg1:CTemp id="FO">
            <Leg1:Group bNum="001" cCount="4">
                <Leg1:Dog ndate="112" pdate="111"/>
                <Leg1:Dog ndate="122" pdate="121"/>
                <Leg1:Dog ndate="132" pdate="131"/>
                <Leg1:Dog ndate="142" pdate="141"/>
            </Leg1:Group>
                <Leg1:Group bNum="002" cCount="4">
                <Leg1:Dog ndate="112" pdate="111"/>
                <Leg1:Dog ndate="122" pdate="121"/>
                <Leg1:Dog ndate="132" pdate="131"/>
                <Leg1:Dog ndate="142" pdate="141"/>
            </Leg1:Group>
        </Leg1:CTemp>
        <Leg1:CTemp id="GO">
            <Leg1:Group bNum="001" cCount="4">
                <Leg1:Dog ndate="112" pdate="111"/>
                <Leg1:Dog ndate="122" pdate="121"/>
                <Leg1:Dog ndate="132" pdate="131"/>
                <Leg1:Dog ndate="142" pdate="141"/>
            </Leg1:Group>
            <Leg1:Group bNum="002" cCount="4">
                <Leg1:Dog ndate="112" pdate="111"/>
                <Leg1:Dog ndate="122" pdate="121"/>
                <Leg1:Dog ndate="132" pdate="131"/>
                <Leg1:Dog ndate="142" pdate="141"/>
            </Leg1:Group>
        </Leg1:CTemp>
    </Leg1:Order>
</Leg1:MOR>

What I need the output to look like:

<Leg1:MOR oCount="7" xmlns:Leg1="http://what.not">
    <Leg1:Order>
        <Leg1:CTemp id="FO">
            <Leg1:Group bNum="001" cCount="10"/>
            <Leg1:Group bNum="002" cCount="10"/>
        </Leg1:CTemp>
        <Leg1:CTemp id="GO">
            <Leg1:Group bNum="001" cCount="10"/>
            <Leg1:Group bNum="002" cCount="10"/>
        </Leg1:CTemp>
    </Leg1:Order>
</Leg1:MOR>

I haven't written anything in a while and my code is useless. I can parse the file, and write it I cannot get the processing right.

import xml.etree.cElementTree as ET
tree = ET.parse("input.xml")
root = tree.getroot()
for x in root.findall('./Order/CTemp/Group'):
    root.remove(x)
tree.write("output.xml")

How do I get it remove the Dog children of the CTemp elements?

Actuary answered 13/5, 2015 at 9:18 Comment(1)
Try to use namespaces.Hennery
H
1

If you can use lxml, try this:

import lxml.etree

tree = lxml.etree.parse("leg.xml")
for dog in tree.xpath("//Leg1:Dog",
                      namespaces={"Leg1": "http://what.not"}):
    parent = dog.xpath("..")[0]
    parent.remove(dog)
    parent.text = None
tree.write("leg.out.xml")

Now leg.out.xml looks like this:

<?xml version="1.0"?>
<Leg1:MOR xmlns:Leg1="http://what.not" oCount="7">
  <Leg1:Order>
    <Leg1:CTemp id="FO">
      <Leg1:Group bNum="001" cCount="4"/>
      <Leg1:Group bNum="002" cCount="4"/>
    </Leg1:CTemp>
    <Leg1:CTemp id="GO">
      <Leg1:Group bNum="001" cCount="4"/>
      <Leg1:Group bNum="002" cCount="4"/>
    </Leg1:CTemp>
  </Leg1:Order>
</Leg1:MOR>
Hennery answered 13/5, 2015 at 9:45 Comment(8)
Great thank you! One step closer. Now can you think of any way to concatenate the Group element from: <Leg1:Group bNum="001" cCount="4"></Leg1:Group> to <Leg1:Group bNum="001" cCount="4"/>Actuary
@Actuary I've improved my answer.Hennery
Awesome! Thank you so much. I hate to admit it but I was stuck on this for a full day yesterday.Actuary
A small side note that your parsing of the xml file produces this error: lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1 This is a problem with larger files, I changed the: tree = lxml.etree.parse(open("leg.xml")) to tree = lxml.etree.parse("leg.xml")Actuary
So if I want to remove the Leg1: prefix from all the elements how would I go about doing that?Actuary
Please ask a new question.Hennery
#30232531 @TichodromaActuary
I will take a look later.Hennery

© 2022 - 2024 — McMap. All rights reserved.