I am using the builtin Python ElementTree module. It is straightforward to access children, but what about parent or sibling nodes? - can this be done efficiently without traversing the entire tree?
There's no direct support in the form of a parent
attribute, but you can perhaps use the patterns described here to achieve the desired effect. The following one-liner is suggested (updated from the linked-to post to Python 3.8) to create a child-to-parent mapping for a whole tree, using the method xml.etree.ElementTree.Element.iter
:
parent_map = {c: p for p in tree.iter() for c in p}
parent_map = {(c,p) for p in tree.iter( ) for c in p}
–
Dietetic parent_map = {c:p for p in root.iter( ) for c in p}
–
Dietetic Vinay's answer should still work, but for Python 2.7+ and 3.2+ the following is recommended:
parent_map = {c:p for p in tree.iter() for c in p}
getiterator()
is deprecated in favor of iter()
, and it's nice to use the new dict
list comprehension constructor.
Secondly, while constructing an XML document, it is possible that a child will have multiple parents, although this gets removed once you serialize the document. If that matters, you might try this:
parent_map = {}
for p in tree.iter():
for c in p:
if c in parent_map:
parent_map[c].append(p)
# Or raise, if you don't want to allow this.
else:
parent_map[c] = [p]
# Or parent_map[c] = p if you don't want to allow this
.find()
has anything to do with that. –
Pocketknife .find()
as an example function that just returns an element –
Ginger You can use xpath ...
notation in ElementTree.
<parent>
<child id="123">data1</child>
</parent>
xml.findall('.//child[@id="123"]...')
>> [<Element 'parent'>]
root.find(".//*[@testname='generated_sql']...")
–
Eurydice ...
XPath syntax. What does it do? Are there docs on it? –
Ragnar ...
expression comes from XPath 1.0. Python Std Library have limited support for XPath expressions, lxml have more support. –
Trapani id
attribute? –
Magi xml.findall('.//child...')
Some other attribute: xml.findall('.//child[@other="123"]...')
–
Trapani .
(select current node) and ..
(get parent). –
Overpowering As mentioned in Get parent element after using find method (xml.etree.ElementTree) you would have to do an indirect search for parent. Having xml:
<a>
<b>
<c>data</c>
<d>data</d>
</b>
</a>
Assuming you have created etree element into xml
variable, you can use:
In[1] parent = xml.find('.//c/..')
In[2] child = parent.find('./c')
Resulting in:
Out[1]: <Element 'b' at 0x00XXXXXX>
Out[2]: <Element 'c' at 0x00XXXXXX>
Higher parent would be found as:secondparent=xml.find('.//c/../..')
being <Element 'a' at 0x00XXXXXX>
Pasting here my answer from https://mcmap.net/q/245875/-get-parent-element-after-using-find-method-xml-etree-elementtree:
I had a similar problem and I got a bit creative. Turns out nothing prevents us from adding the parent info ourselves. We can later strip it once we no longer need it.
def addParentInfo(et):
for child in et:
child.attrib['__my_parent__'] = et
addParentInfo(child)
def stripParentInfo(et):
for child in et:
child.attrib.pop('__my_parent__', 'None')
stripParentInfo(child)
def getParent(et):
if '__my_parent__' in et.attrib:
return et.attrib['__my_parent__']
else:
return None
# Example usage
tree = ...
addParentInfo(tree.getroot())
el = tree.findall(...)[0]
parent = getParent(el)
while parent:
doSomethingWith(parent)
parent = getParent(parent)
stripParentInfo(tree.getroot())
The XPath '..' selector cannot be used to retrieve the parent node on 3.5.3 nor 3.6.1 (at least on OSX), eg in interactive mode:
import xml.etree.ElementTree as ET
root = ET.fromstring('<parent><child></child></parent>')
child = root.find('child')
parent = child.find('..') # retrieve the parent
parent is None # unexpected answer True
The last answer breaks all hopes...
Got an answer from
https://towardsdatascience.com/processing-xml-in-python-elementtree-c8992941efd2
Tip: use '...' inside of XPath to return the parent element of the current element.
for object_book in root.findall('.//*[@name="The Hunger Games"]...'):
print(object_book)
Most solutions posted so far
- either use XPath… but Python does not support finding ancestors with XPath in general (see comment),
- or post-process the whole tree after it is built (e.g. this answer or that one)… but this requires parsing and building the whole tree, which might be undesirable with large XML data (e.g. Wikipedia dumps).
If you are parsing XML incrementally, say with xml.etree.ElementTree.iterparse
or xml.etree.ElementTree.XMLPullParser
, you can keep track of the current path (up from the root node down to the current node) by tracking the opening and closing of tags (start
and end
events). Example:
import xml.etree.ElementTree as ET
current_path = [ ]
for event, elem in ET.iterparse('test.xml', events=['start', 'end']):
# opening tag:
if event == 'start':
current_path.append(elem)
# closing tag:
else:
assert event == 'end'
assert len(current_path) > 0 and current_path[-1] is elem
current_path.pop()
parent = current_path[-1] if len(current_path) > 0 else None
# `elem` is the current element (fully built),
# `parent` is its parent (some of its children after `elem`
# might not have been parsed yet)
#
# ... do something ...
If you are using lxml, I was able to get the parent element with the following:
parent_node = next(child_node.iterancestors())
This will raise a StopIteration
exception if the element doesn't have ancestors - so be prepared to catch that if you may run into that scenario.
import xml.etree.ElementTree as ET
f1 = "yourFile"
xmlTree = ET.parse(f1)
for root in xmlTree.getroot():
print(root.tag)
Another way if just want a single subElement's parent and also known the subElement's xpath.
parentElement = subElement.find(xpath+"/..")
subElement.find('..')
. –
Suprasegmental xpath
already exists, so it's not helpful for most people. –
Senarmontite Look at the 19.7.2.2. section: Supported XPath syntax ...
Find node's parent using the path:
parent_node = node.find('..')
None
if the path attempts to reach the ancestors of the start element (the element find
was called on)." (docs.python.org/3/library/…). –
Halloran © 2022 - 2024 — McMap. All rights reserved.