Search and remove element with elementTree in Python
Asked Answered
M

9

35

I have an XML document in which I want to search for some elements and if they match some criteria I would like to delete them

However, I cannot seem to be able to access the parent of the element so that I can delete it

file = open('test.xml', "r")
elem = ElementTree.parse(file)

namespace = "{http://somens}"

props = elem.findall('.//{0}prop'.format(namespace))
for prop in props:
    type = prop.attrib.get('type', None)
    if type == 'json':
        value = json.loads(prop.attrib['value'])
        if value['name'] == 'Page1.Button1':
            #here I need to access the parent of prop
            # in order to delete the prop

Is there a way I can do this?

Thanks

Maximinamaximize answered 27/7, 2011 at 15:45 Comment(0)
P
43

You can remove child elements with the according remove method. To remove an element you have to call its parents remove method. Unfortunately Element does not provide a reference to its parents, so it is up to you to keep track of parent/child relations (which speaks against your use of elem.findall())

A proposed solution could look like this:

root = elem.getroot()
for child in root:
    if child.name != "prop":
        continue
    if True:# TODO: do your check here!
        root.remove(child)

PS: don't use prop.attrib.get(), use prop.get(), as explained here.

Pinery answered 27/7, 2011 at 16:7 Comment(4)
I see. I am also taking a look at lxml which from what I have read provide access to the element's parent. Thanks anywayMaximinamaximize
Yes, that is correct. lxml provides an ElementTree implementation with more features than the interface normally states. The Element class in lxml provides the getparent() method to get a reference to a parent Element.Pinery
What if the child element is more than one element deep from the root? What if it's at variable depths?Pattern
"As explained here" - here links to attrib [#] (Attribute) Element attribute dictionary. Where possible, use get, set, keys, and items to access element attributes., but that's hardly an explanation. What is the reason for using get over attrib?Tarragona
S
9

I know this is an old thread but this kept popping up while I was trying to figure out a similar task. I did not like the accepted answer for two reasons:

1) It doesn't handle multiple nested levels of tags.

2) It will break if multiple xml tags are deleted in the same level one-after-another. Since each element is an index of Element._children you shouldn't delete while forward iterating.

I think a better more versatile solution is this:

import xml.etree.ElementTree as et
file = 'test.xml'
tree = et.parse(file)
root = tree.getroot()

def iterator(parents, nested=False):
    for child in reversed(parents):
        if nested:
            if len(child) >= 1:
                iterator(child)
        if True:  # Add your entire condition here
            parents.remove(child)

iterator(root, nested=True)

For the OP, this should work - but I don't have the data you're working with to test if it's perfect.

import xml.etree.ElementTree as et
file = 'test.xml'
tree = et.parse(file)

namespace = "{http://somens}"
props = tree.findall('.//{0}prop'.format(namespace))

def iterator(parents, nested=False):
    for child in reversed(parents):
        if nested:
            if len(child) >= 1:
                iterator(child)
        if prop.attrib.get('type') == 'json':
            value = json.loads(prop.attrib['value'])
            if value['name'] == 'Page1.Button1':
                parents.remove(child)

iterator(props, nested=True)
Saline answered 23/8, 2017 at 1:18 Comment(0)
R
6

You could use xpath to select an Element's parent.

file = open('test.xml', "r")
elem = ElementTree.parse(file)

namespace = "{http://somens}"

props = elem.findall('.//{0}prop'.format(namespace))
for prop in props:
    type = prop.get('type', None)
    if type == 'json':
        value = json.loads(prop.attrib['value'])
        if value['name'] == 'Page1.Button1':
            # Get parent and remove this prop
            parent = prop.find("..")
            parent.remove(prop)

http://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax

Except if you try that it doesn't work: http://elmpowered.skawaii.net/?p=74

So instead you have to:

file = open('test.xml', "r")
elem = ElementTree.parse(file)

namespace = "{http://somens}"
search = './/{0}prop'.format(namespace)

# Use xpath to get all parents of props    
prop_parents = elem.findall(search + '/..')
for parent in prop_parents:
    # Still have to find and iterate through child props
    for prop in parent.findall(search):
        type = prop.get('type', None)
        if type == 'json':
            value = json.loads(prop.attrib['value'])
            if value['name'] == 'Page1.Button1':
                parent.remove(prop)

It is two searches and a nested loop. The inner search is only on Elements known to contain props as first children, but that may not mean much depending on your schema.

Rustie answered 1/6, 2013 at 0:7 Comment(0)
K
3

A solution using lxml module

from lxml import etree

root = ET.fromstring(xml_str)
for e in root.findall('.//{http://some.name.space}node'):
parent = e.getparent()
for child in parent.find('./{http://some.name.space}node'):
    try:
        parent.remove(child)
    except ValueError:
        pass
Kinna answered 3/4, 2018 at 13:41 Comment(0)
M
2

Using the fact that every child must have a parent, I'm going to simplify @kitsu.eb's example. f using the findall command to get the children and parents, their indices will be equivalent.

    file = open('test.xml', "r")
    elem = ElementTree.parse(file)

    namespace = "{http://somens}"
    search = './/{0}prop'.format(namespace)

    # Use xpath to get all parents of props    
    prop_parents = elem.findall(search + '/..')

    props = elem.findall('.//{0}prop'.format(namespace))
    for prop in props:
            type = prop.attrib.get('type', None)
            if type == 'json':
                value = json.loads(prop.attrib['value'])
                if value['name'] == 'Page1.Button1':
                    #use the index of the current child to find
                    #its parent and remove the child
                    prop_parents[props.index[prop]].remove(prop)
Macswan answered 13/8, 2016 at 14:50 Comment(0)
U
2

I also used XPath for this issue, but in a different way:

root = elem.getroot()    
elementName = "YourElement"
#this will find all the parents of the elements with elementName
for elementParent in root.findall(".//{}/..".format(elementName)):
   #this will find all the elements under the parent, and remove them
   for element in elementParent.findall("{}".format(elementName)):
      elementParent.remove(element)
Umbrage answered 23/8, 2021 at 13:16 Comment(1)
Or use ".//*[{}]".format(elementName)": then you're looking for the parent node, and removing the child is easy.Anglesey
M
1

I like to use an XPath expression for this kind of filtering. Unless I know otherwise, such an expression must be applied at the root level, which means I can't just get a parent and apply the same expression on that parent. However, it seems to me that there is a nice and flexible solution that should work with any supported XPath, as long as none of the sought nodes is the root. It goes something like this:

root = elem.getroot()
# Find all nodes matching the filter string (flt)
nodes = root.findall(flt)
while len(nodes):
    # As long as there are nodes, there should be parents
    # Get the first of all parents to the found nodes
    parent = root.findall(flt+'/..')[0]
    # Use this parent to remove the first node
    parent.remove(nodes[0])
    # Find all remaining nodes
    nodes = root.findall(flt)
Martinamartindale answered 6/2, 2018 at 7:43 Comment(0)
B
1

I would like only to add a comment on the accepted answer, but my lack of reputation doesn't allow me to. I wanted to add that it is important to add .findall("*")to the iterator to avoid issues, as stated in the documentation:

Note that concurrent modification while iterating can lead to problems, just like when iterating and modifying Python lists or dicts. Therefore, the example first collects all matching elements with root.findall(), and only then iterates over the list of matches.

Therefore, in the accepted answer the iteration should be for child in root.findal("*"):instead of for child in root:. Not doing so made my code skip some elements from the list.

Breastpin answered 1/6, 2022 at 14:32 Comment(0)
B
0

If you stumble on this question because you want to search and remove elements with ElementTree

  • using built in xml module (not lxml)
  • being as flexible as ElementTree.findall (using xpath subset)
  • directly referencing the elements to be deleted, not the parents
  • work on any nesting level
  • work even if found elements are nested in other found elements

Then this function may help. It builts and uses a map from the element to its parents.


import itertools
from xml.etree import ElementTree

def deleteall(root: ElementTree.Element, match, namespaces=None):
    parent_by_child=dict(itertools.chain.from_iterable(
        ((child, element) for child in element) for element in root.iter()))

    for element in root.findall(match, namespaces):
        parent_by_child[element].remove(element)

Additional checks as required in the original post can be done by a Callable provided as additional argument:


import itertools
from typing import Callable
from xml.etree import ElementTree

def deleteall(
    root: ElementTree.Element,
    match,
    namespaces=None,
    deletion_criteria: Callable[[ElementTree.Element], bool]=lambda x: True
):
    parent_by_child=dict(itertools.chain.from_iterable(
        ((child, element) for child in element) for element in root.iter()))
    for element in root.findall(match, namespaces):
        if deletion_criteria(element):
            parent_by_child[element].remove(element)

Further extensions like providing both the element and its parent to the deletion criteria would be possible.

Blazon answered 15/11, 2023 at 20:40 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.