XML ElementTree - indexing tags
Asked Answered
V

1

6

I have a XML file:

<sentence id="en_BlueRibbonSushi_478218345:2">
   <text>It has great sushi and even better service.</text>
</sentence>
<sentence id="en_BlueRibbonSushi_478218345:3">
   <text>The entire staff was extremely accomodating and tended to my every need.</text>
</sentence>
<sentence id="en_BlueRibbonSushi_478218345:4">
   <text>I&apos;ve been to this restaurant over a dozen times with no complaints to date.</text>
</sentence>

Using XML ElementTree, I would like to insert a tag <Opinion> that has an attribute category=. Say I have a list of chars list = ['a', 'b', 'c'], is it possible to incrementally asign them to each text so I have:

<sentence id="en_BlueRibbonSushi_478218345:2">
   <text>It has great sushi and even better service.</text>
   <Opinion category='a' />
</sentence>
<sentence id="en_BlueRibbonSushi_478218345:3">
   <text>The entire staff was extremely accomodating and tended to my every need.</text>
   <Opinion category='b' />
</sentence>
<sentence id="en_BlueRibbonSushi_478218345:4">
   <text>I&apos;ve been to this restaurant over a dozen times with no complaints to date.</text>
   <Opinion category='c' />
</sentence>

I am aware I can use the sentence id attribute but this would require a lot of restructuring of my code. Basically, I'd like to be able to index each sentence entry to align with my list index.

Vapid answered 29/3, 2017 at 14:7 Comment(3)
Attributes are a dictionary. Dictionary order is not guaranteed to be preserved in any way.Weathering
Oh I see. Works with the validator (checking against as a gold standard) as is so no need to change. Thanks!Vapid
Can you make a reproducible example?Hysterectomize
M
4

You can use the SubElement factory function to add elements to the tree. Assuming your XML data is in a variable called data, this will add the elements to your document tree:

import xml.etree.ElementTree as ET
tree = ET.XML(data)
for elem, category in zip(tree.findall('sentence'), ['a', 'b', 'c']):
    Opinion  = ET.SubElement(elem, 'Opinion')
    Opinion.set('category', category)

ET.dump(tree)  # prints the tree; tree.write('output.xml') is another option
Mohican answered 1/4, 2017 at 0:4 Comment(2)
zip will stop when the shortest iterable passed in runs out, so the slicing you suggest isn't needed (because it won't make any difference). In any case, I'm assuming the OP has some more interesting way of generating the category list.Mohican
This is exactly what I needed. Thanks a million!Vapid

© 2022 - 2024 — McMap. All rights reserved.