Can .findall() match multiple values in python etree?
Asked Answered
M

1

9

Is there a way to match multiple elements in a tree using .findall()?

I would like to do this:

trees = log.findall('element1' or 'element2')

This is my work around (which works in my case because I don't have both e1 and e2 in the same XML):

trees = log.findall('element1')
if not trees:
    trees = log.findall('element2')

I am parsing XML files that have similar structures but different names. C# allows "element1 | element2" matching.

Modify answered 21/7, 2014 at 17:38 Comment(3)
Given your workaround, what if there are elements in both element1 and element2?Gel
Good point. I edited to question to be more clear. I'm boiler plating code that works either with 1 or 2, never both.Modify
@user3769076: Can you require lxml and use lxml.etree in place of the stdlib xml.etree? It often works as a drop-in replacement, and it offers a better answer here.Gar
I
13

No, you can't. C# appears to be using XPath expressions, but the ElementTree XPath support for XPath queries is too limited and does not include the support for this.

You can use or to pick your second search if the first is empty:

trees = log.findall('element1') or log.findall('element2')

because an empty result is false-y.

The alternative is to use lxml, an ElementTree API implementation on top of libxml2, which supports all of the XPath 1.0 spec. Then you can do:

log.xpath('(.//element1|.//element2)')
Isomerize answered 21/7, 2014 at 17:43 Comment(7)
Does the lxml implementation of the etree API support | in XPath? If so, that may be an acceptable alternative for the OP.Gar
@abarnert: lxml supports all of XPath 1.0.Isomerize
Yeah, I'm just not sure whether it has full XPath in its etree findall or not (and for some reason it's not building on the machine I'm sitting at, so I can't test…).Gar
@abarnert: no, .xpath() is the method to use here; .findall() was kept bug-compatible with the original API implementation.Isomerize
Thank you, I couldn't find out from the webpage or any other explanation that you could use (a|b) with xpath. The documentation is horrible although lxml is one of the best tools.Surrealism
FWIW, I had to use the XPath syntax .xpath("//element1|//element2'). I couldn't get lxml to accept the //(element1|element2) pattern,.Variform
@TomJohnson: ah, yes, that's my mistake. XPath 1.0 doesn't support | unions between relative location paths (the element1 and element2 strings), only between path expressions (which includes the // prefix). I'll correct my answer.Isomerize

© 2022 - 2024 — McMap. All rights reserved.