Python ElementTree - iterate through child nodes and text in order
Asked Answered
T

1

9

I am using python the third and the ElementTree API. I have some xml of the form:

<root>
  <item>Over the <ref id="river" /> and through the <ref id="woods" />.</item>
  <item>To Grandmother's <ref id="house" /> we go.</item>
</root>

I want to be able to iterate through the text and child nodes for a given item in order. So, for the first item, the list I want printed line by line would be:

Over the 
<Element 'ref' at 0x######>
 and through the 
<Element 'ref' at 0x######>
.

But I can't figure out how to do this with ElementTree. I can get the text in order via itertext() and the child elements in order in several ways, but not them interleaved together in order. I was hoping I could use an XPath expression like ./@text|./ref, but ElementTree's subset of XPath doesn't seem to support attribute selection. If I could even just get the original raw xml contents of each item node, I could parse it out myself if necessary.

Tetanize answered 11/2, 2017 at 9:9 Comment(2)
how should look the final output?Comber
The output is stated above.Tetanize
K
11

Try this:

from xml.etree import ElementTree as ET

xml = """<root>
  <item>Over the <ref id="river" /> and through the <ref id="woods" />.</item>
  <item>To Grandmother's <ref id="house" /> we go.</item>
</root>"""

root = ET.fromstring(xml)

for item in root:
    if item.text:
        print(item.text)
    for ref in item:
        print(ref)
        if ref.tail:
            print(ref.tail)

ElementTrees representation of "mixed content" is based on .text and .tail attributes. The .text of an element represents the text of the element up to the first child element. That child's .tail then contains the text of its parent following it. See the API doc.

Kleper answered 11/2, 2017 at 9:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.