I would personally recommend that you use an xml parser that fully supports xpath expressions. The subset supported by xml.etree
is insufficient for tasks like this.
For example, in lxml
I can do:
"give me all children of the children of the <item>
node":
doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse
Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>]
or,
"give me all of <item>
's children that have no children themselves":
doc.xpath('/item/*[count(child::*) = 0]')
Out[20]:
[<Element a1 at 0x7f60ec1c1588>,
<Element a2 at 0x7f60ec1c15c8>,
<Element a3 at 0x7f60ec1c1608>]
or,
"give me ALL of the elements that don't have any children":
doc.xpath('//*[count(child::*) = 0]')
Out[29]:
[<Element a1 at 0x7f60ec1c1588>,
<Element a2 at 0x7f60ec1c15c8>,
<Element a3 at 0x7f60ec1c1608>,
<Element a11 at 0x7f60ec1c1348>,
<Element a22 at 0x7f60ec1c1888>]
# and if I only care about the text from those nodes...
doc.xpath('//*[count(child::*) = 0]/text()')
Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']