Converting a Python XML ElementTree to a String
Asked Answered
S

2

18

I need to convert an XML ElementTree to a String after altering it. It's the toString part that isn't working.

import xml.etree.ElementTree as ET

tree = ET.parse('my_file.xml')
root = tree.getroot()

for e in root.iter('tag_name'):
    e.text = "something else" # This works

# Now I want the the complete XML as a String with the alteration

I've tried various versions of the below line, with ET or ElementTree as various names, and importing toString, etc. etc,

s = tree.tostring(ET, encoding='utf8', method='xml')

I have seen Convert Python ElementTree to string and some others, but I'm not sure how to apply it to my example.

Schreiner answered 19/11, 2015 at 21:22 Comment(2)
One not so ideal way is convert the xml to a dict and make the change you need and convert the dict to the format or type you need. Try using XMLTODICT or Beautifulsoup to handle and parse the xml file. Example: I use xmltodict to parse convert a xml file to a dict and after taht I convert the dict in a html table using json2htmlChildlike
Possible duplicate of Convert Python ElementTree to stringTatter
S
20

This should work:-

xmlstr = ET.tostring(root, encoding='utf8', method='xml')
Shallow answered 19/11, 2015 at 21:36 Comment(2)
ironically, tostring generates python bytesDarciedarcy
Due to a change instr, this does not work in Python 3. Use either ET.tostring(root).decode() or ET.tostring(root, encoding='unicode', method='xml') instead.Tatter
T
10

How do I convert ElementTree.Element to a String?

For Python 3:

xml_str = ElementTree.tostring(xml, encoding='unicode')

For Python 2:

xml_str = ElementTree.tostring(xml, encoding='utf-8')

For compatibility with both Python 2 & 3:

xml_str = ElementTree.tostring(xml).decode()

Example usage

from xml.etree import ElementTree

xml = ElementTree.Element("Person", Name="John")
xml_str = ElementTree.tostring(xml).decode()
print(xml_str)

Output:

<Person Name="John" />

Explanation

Despite what the name implies, ElementTree.tostring() returns a bytestring by default in Python 2 & 3. This is an issue in Python 3, which uses Unicode for strings.

In Python 2 you could use the str type for both text and binary data. Unfortunately this confluence of two different concepts could lead to brittle code which sometimes worked for either kind of data, sometimes not. [...]

To make the distinction between text and binary data clearer and more pronounced, [Python 3] made text and binary data distinct types that cannot blindly be mixed together.

Source: Porting Python 2 Code to Python 3

If we know what version of Python is being used, we can specify the encoding as unicode or utf-8. Otherwise, if we need compatibility with both Python 2 & 3, we can use decode() to convert into the correct type.

For reference, I've included a comparison of .tostring() results between Python 2 and Python 3.

ElementTree.tostring(xml)
# Python 3: b'<Person Name="John" />'
# Python 2: <Person Name="John" />

ElementTree.tostring(xml, encoding='unicode')
# Python 3: <Person Name="John" />
# Python 2: LookupError: unknown encoding: unicode

ElementTree.tostring(xml, encoding='utf-8')
# Python 3: b'<Person Name="John" />'
# Python 2: <Person Name="John" />

ElementTree.tostring(xml).decode()
# Python 3: <Person Name="John" />
# Python 2: <Person Name="John" />

Thanks to Martijn Peters for pointing out that the str datatype changed between Python 2 and 3.


Why not use str()?

In most scenarios, using str() would be the "cannonical" way to convert an object to a string. Unfortunately, using this with Element returns the object's location in memory as a hexstring, rather than a string representation of the object's data.

from xml.etree import ElementTree

xml = ElementTree.Element("Person", Name="John")
print(str(xml))  # <Element 'Person' at 0x00497A80>
Tatter answered 7/2, 2018 at 21:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.