How do I get properly escaped XML in python etree untouched?
Asked Answered
O

1

8

I'm using python version 2.7.3.

test.txt:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <test>The tag &lt;StackOverflow&gt; is good to bring up at parties.</test>
</root>

Result:

>>> import xml.etree.ElementTree as ET
>>> e = ET.parse('test.txt')
>>> root = e.getroot()
>>> print root.find('test').text
The tag <StackOverflow> is good to bring up at parties.

As you can see, the parser must have changed the &lt;'s to <'s etc.

What I'd like to see:

The tag &lt;StackOverflow&gt; is good to bring up at parties.

Untouched, raw text. Sometimes I really like it raw. Uncooked.

I'd like to use this text as-is for display within HTML, therefore I don't want an XML parser to mess with it.

Do I have to re-escape each string or can there be another way?

Omni answered 7/5, 2014 at 11:33 Comment(2)
For displaying in other sources, simply re-escape! It's a parser's job to give you the proper XML contents after parsing, and HTML escaping can be subtly different anyway.Sponson
Fair point, will probably do that. Just was curious if there's some option in the parser or such.Omni
B
5
import xml.etree.ElementTree as ET
e = ET.parse('test.txt')
root = e.getroot()
print(ET.tostring(root.find('test')))

yields

<test>The tag &lt;StackOverflow&gt; is good to bring up at parties.</test>

Alternatively, you could escape the text with saxutils.escape:

import xml.sax.saxutils as saxutils
print(saxutils.escape(root.find('test').text))

yields

The tag &lt;StackOverflow&gt; is good to bring up at parties.
Baily answered 7/5, 2014 at 11:37 Comment(1)
Both cases simply re-escape the value.Sponson

© 2022 - 2024 — McMap. All rights reserved.