Python pretty print an XML given an XML string
Asked Answered
S

3

14

I generated a long and ugly XML string with Python and I need to filter it through pretty printer to look nicer.

I found this post for python pretty printers, but I have to write the XML string to a file to be read back to use the tools, which I want to avoid if possible.

What python pretty tools are available that work on strings?

Sorry answered 20/10, 2010 at 0:4 Comment(2)
Which python XML library are you using?Alexisaley
@Paul : I use "from xml.dom import minidom".Sorry
M
28

Here's how to parse from a text string to the lxml structured data type.

Python 2:

from lxml import etree
xml_str = "<parent><child>text</child><child>other text</child></parent>"
root = etree.fromstring(xml_str)
print etree.tostring(root, pretty_print=True)

Python 3:

from lxml import etree
xml_str = "<parent><child>text</child><child>other text</child></parent>"
root = etree.fromstring(xml_str)
print(etree.tostring(root, pretty_print=True).decode())

Outputs:

<parent>
  <child>text</child>
  <child>other text</child>
</parent>
Mattland answered 20/10, 2010 at 1:27 Comment(2)
i was having some escaping problems and the etree.tounicode method resolved itHoroscope
With Python 3, use print(etree.tostring(tree, pretty_print=True).decode()).Brooklynese
A
7

I use the lxml library, and there it's as simple as

>>> print(etree.tostring(root, pretty_print=True))

You can do that operation using any etree, which you can either generate programmatically, or read from a file.

If you're using the DOM from PyXML, it's

import xml.dom.ext
xml.dom.ext.PrettyPrint(doc)

That prints to the standard output, unless you specify an alternate stream.

http://pyxml.sourceforge.net/topics/howto/node19.html

To directly use the minidom, you want to use the toprettyxml() function.

http://docs.python.org/library/xml.dom.minidom.html#xml.dom.minidom.Node.toprettyxml

Alexisaley answered 20/10, 2010 at 0:7 Comment(2)
It looks like that both root and doc are structured data, not a string. Thanks for the answer.Sorry
If your xml exists as a minidom node, you can use the toprettyxml() function. If it really only ever exists as a string, you will have to parse it in before you can pretty print it out.Alexisaley
A
2

Here's a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations. You mention that you have an xml string already so I am going to assume you used xml.dom.minidom.parseString()

With the following solution you can avoid writing to a file first:

import xml.dom.minidom
import os

def pretty_print_xml_given_string(input_string, output_xml):
    """
    Useful for when you are editing xml data on the fly
    """
    xml_string = input_string.toprettyxml()
    xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
    with open(output_xml, "w") as file_out:
        file_out.write(xml_string)

I found how to fix the common newline issue here.

Audile answered 12/2, 2020 at 17:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.