ElementTree findall 'or' operator
Asked Answered
B

2

7

If I have an xml file like this:

<root>
  <item>
    <prop>something</prop>
  </item>
  <test>
    <prop>something</prop>
  </test>
  <test2>
    <prop>something</prop>
  </test2>
</root>

I can use xmlTree.getroot().findall("item") to get all of the 'item' elements.

How would I get all of the 'item' OR 'test' elements? I want something like:

xmlTree.getroot().findall("item or test")

I didn't see anything like this in the examples in the documentation. Any ideas?

Brina answered 21/3, 2014 at 13:57 Comment(0)
C
14

Since ElementTree from stdlib provides only limited xpath support, you can use | xpath OR operator only if you are using lxml:

from lxml import etree as ET


data = """<?xml version="1.0"?>
<data>
<item>1</item>
<test>2</test>
</data>"""

tree = ET.fromstring(data)

for element in tree.xpath('//item|//test'):
    print element.text

prints:

1
2

In case of xml.etree.ElementTree you can combine the results of two separate findall() calls:

for element in tree.findall('.//item') + tree.findall('.//test'):
    print element.text

Or, check the tag name inside the loop:

for element in tree.iter():
    if element.tag in ('item', 'test'):
        print element.text
Compunction answered 21/3, 2014 at 14:1 Comment(3)
I like the first version with the xpath. Your code works as indicated. However, I tried to change it to read the xml from a file:# -- coding: utf-8 -- import xml.etree.ElementTree as ET tree = ET.parse('data.xml') for element in tree.xpath('//item|//test'): print element.text but I get AttributeError: ElementTree instance has no attribute 'xpath' I also tried to just use the xpath syntax like this: root = tree.getroot() for element in root.findall('item|test'): print element.text But nothing is output?Brina
Ah I see the difference between the lxml and xml.etree.ElementTree now. It works like this. Are there other differences between these modules? That is, if I have other code that I've written with etree.ElementTree, will it continue to work if I change to lxml?Brina
@DavidDoria it should, but you may have to change the code a bit, see differences here.Compunction
L
0

A "wild-card" solution for large data-set

Here is a solution where you do not need to specify "A | B| ...". Instead use "*" as a wild card, and filter out unwanted parts by index as shown below in the code (for example, in this question the last tag "test2" can be excluded by using lst[:-1]).

import xml.etree.ElementTree as ET
data='''
<root>
  <item>
    <prop>something1</prop>
  </item>
  <test>
    <prop>something2</prop>
  </test>
  <test2>
    <prop>something3</prop>
  </test2>
</root>'''
root = ET.fromstring(data)
lst = root.findall('*')
for x in lst[:-1]:
    print(x.find('prop').text)

OUTPUT:

something1

something2

Landau answered 24/3, 2018 at 23:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.