Extracting text after tag in Python's ElementTree

About

Asked 12/3, 2012 at 19:58 Answered 12/3, 2012 at 20:11

Solved python text xml-parsing elementtree

Here is a part of XML:

<item><img src="cat.jpg" /> Picture of a cat</item>

Extracting the tag is easy. Just do:

et = xml.etree.ElementTree.fromstring(our_xml_string)
img = et.find('img')

But how to get the text immediately after it (Picture of a cat)? Doing the following returns a blank string:

print et.text

Reachmedown answered 12/3, 2012 at 19:58 Comment(0)

Elements have a tail attribute -- so instead of element.text, you're asking for element.tail.

>>> import lxml.etree
>>> root = lxml.etree.fromstring('''<root><foo>bar</foo>baz</root>''')
>>> root[0]
<Element foo at 0x145a3c0>
>>> root[0].tail
'baz'

Or, for your example:

>>> et = lxml.etree.fromstring('''<item><img src="cat.jpg" /> Picture of a cat</item>''')
>>> et.find('img').tail
' Picture of a cat'

This also works with plain ElementTree:

>>> import xml.etree.ElementTree
>>> xml.etree.ElementTree.fromstring(
...   '''<item><img src="cat.jpg" /> Picture of a cat</item>'''
... ).find('img').tail
' Picture of a cat'

Defoliant answered 12/3, 2012 at 20:11 Comment(3)

Brilliant. I tried using .tail before, but I was using it on my el object. Did not realise I had to use it on img. Thank you for enlightening me! – Reachmedown 12/3, 2012 at 20:17

@Neuron, ElementTree is a library name, not a code snippet. Its original formatting was deliberate. See When should code formatting be used for non-code text? and Should code formatting be used for package names? on Meta Stack Overflow. – Defoliant 9/2, 2022 at 13:55

Thanks for the meta posts. though I am aware, that package names should not be code highlighted. I thought you were referencing the type ElementTree, not the package – Yawata 9/2, 2022 at 15:56

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags