How do I match contents of an element in XPath (lxml)?

Asked 14/4, 2010 at 13:35 Answered 31/1 at 19:14

I want to parse HTML with lxml using XPath expressions. My problem is matching for the contents of a tag:

For example given the

<a href="http://something">Example</a>

element I can match the href attribute using

.//a[@href='http://something']

but the given the expression

.//a[.='Example']

or even

.//a[contains(.,'Example')]

lxml throws the 'invalid node predicate' exception.

What am I doing wrong?

EDIT:

Example code:

from lxml import etree
from cStringIO import StringIO

html = '<a href="http://something">Example</a>'
parser = etree.HTMLParser()
tree   = etree.parse(StringIO(html), parser)

print tree.find(".//a[text()='Example']").tag

Expected output is 'a'. I get 'SyntaxError: invalid node predicate'

Stutsman answered 14/4, 2010 at 13:35 Comment(1)

Instead of using StringIO, you could have used etree.fromstring() to parse your html. – Scotland 4/8, 2011 at 7:9

I would try with:

.//a[text()='Example']

using xpath() method:

tree.xpath(".//a[text()='Example']")[0].tag

If case you would like to use iterfind(), findall(), find(), findtext(), keep in mind that advanced features like value comparison and functions are not available in ElementPath.

lxml.etree supports the simple path syntax of the find, findall and findtext methods on ElementTree and Element, as known from the original ElementTree library (ElementPath). As an lxml specific extension, these classes also provide an xpath() method that supports expressions in the complete XPath syntax, as well as custom extension functions.

Distilled answered 14/4, 2010 at 13:54 Comment(5)

I don't want to find the link based on href, but based on the text it contains: "Example" in the above example :) .//a[@href='something'] works the way it is... – Stutsman 14/4, 2010 at 13:59

you need to remove an = .//a[text()='Example'] – Katz 14/4, 2010 at 14:20

Thanks for your suggestion, but this one raises "SyntaxError: invalid node predicate" too – Stutsman 14/4, 2010 at 14:20

Thank you: with XPath() it really works. Strangely enough @href works in both cases. – Stutsman 15/4, 2010 at 1:9

@Distilled Then is .//a[text()='Example'] invalid in this case? – Dhammapada 20/11, 2015 at 16:26

The currently accepted answer has performance drawbacks. Consider using:

xml_doc.finditer('.//element[.="text to match"]')

instead.

This is documented in the find syntax docs here for the original xml.ElementTree implementation: https://docs.python.org/3/library/xml.etree.elementtree.html#supported-xpath-syntax and lxml.etree says it uses the same restricted syntax for its find methods.

Singhal answered 31/1 at 19:14 Comment(0)

Recommended topics

Hot tags