What is the best way (or are the various ways) to pretty print XML in Python?
import xml.dom.minidom
dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()
xml.toprettyxml(indent="", newl="")
this to have a 4 indent spaces and no newline everywhere. add one-liner: python -c 'import sys, xml.dom.minidom as xmld; print(xmld.parse(sys.argv[1]).toprettyxml(indent="",newl=""))' my_xml_file.xml
–
Melendez pretty_xml_as_string = '\n'.join(list(filter(lambda x: len(x.strip()), pretty_xml_as_string.split('\n'))))
–
Copy lxml is recent, updated, and includes a pretty print function
import lxml.etree as etree
x = etree.parse("filename")
print etree.tostring(x, pretty_print=True)
Check out the lxml tutorial: https://lxml.de/tutorial.html
aptitude install
away. Under OS/X I'm not sure. –
Steno print(etree.tostring(x, pretty_print=True, encoding="unicode"))
. Writing to an output file is possible in just one line, no intermediary variable needed: etree.parse("filename").write("outputfile", encoding="utf-8")
–
Sadi etree.XMLParser(remove_blank_text=True)
sometime can help to do the right printing –
Epizoic etree.tostring(x, pretty_print=True, encoding="unicode")
; but I then also had to use pretty_str = pretty_str.replace('\\n', '\n')
–
Falls Another solution is to borrow this indent
function, for use with the ElementTree library that's built in to Python since 2.5.
Here's what that would look like:
from xml.etree import ElementTree
def indent(elem, level=0):
i = "\n" + level*" "
j = "\n" + (level-1)*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for subelem in elem:
indent(subelem, level+1)
if not elem.tail or not elem.tail.strip():
elem.tail = j
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = j
return elem
root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)
tree.write([filename])
for writing to file (tree
being the ElementTree instance). –
Roast tree = ElementTree.parse('file) ; root = tree.getroot() ; indent(root); tree.write('Out.xml');
–
Lietman TypeError: object of type 'ElementTree' has no len()
when I run this. If I use tree.write
before passing tree to indent
, I get an xml file with everything on one line. –
Quach You have a few options.
If you are using Python 3.9+, your simplest option is:
xml.etree.ElementTree.indent()
Batteries included and pretty output.
Sample code:
import xml.etree.ElementTree as ET
element = ET.XML("<html><body>text</body></html>")
ET.indent(element)
print(ET.tostring(element, encoding='unicode'))
BeautifulSoup.prettify()
BeautifulSoup may be the simplest solution for Python < 3.9.
from bs4 import BeautifulSoup
bs = BeautifulSoup(open(xml_file), 'xml')
pretty_xml = bs.prettify()
print(pretty_xml)
Output:
<?xml version="1.0" encoding="utf-8"?> <issues> <issue> <id> 1 </id> <title> Add Visual Studio 2005 and 2008 solution files </title> </issue> </issues>
This is my goto answer. The default arguments work as is. But text contents are spread out on separate lines as if they were nested elements.
lxml.etree.parse()
Prettier output but with arguments.
from lxml import etree
x = etree.parse(FILE_NAME)
pretty_xml = etree.tostring(x, pretty_print=True, encoding=str)
Produces:
<issues> <issue> <id>1</id> <title>Add Visual Studio 2005 and 2008 solution files</title> <details>We need Visual Studio 2005/2008 project files for Windows.</details> </issue> </issues>
This works for me with no issues.
xml.dom.minidom.parse()
No external dependencies but post-processing.
import xml.dom.minidom as md
dom = md.parse(FILE_NAME)
# To parse string instead use: dom = md.parseString(xml_string)
pretty_xml = dom.toprettyxml()
# remove the weird newline issue:
pretty_xml = os.linesep.join([s for s in pretty_xml.splitlines()
if s.strip()])
The output is the same as above, but it's more code.
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: xml. Do you need to install a parser library?
–
Tonyatonye python3 -m pip install --user lxml
–
Germany remove the weird newline issue
! ty –
Intercommunion Here's my (hacky?) solution to get around the ugly text node problem.
uglyXml = doc.toprettyxml(indent=' ')
text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)
prettyXml = text_re.sub('>\g<1></', uglyXml)
print prettyXml
The above code will produce:
<?xml version="1.0" ?>
<issues>
<issue>
<id>1</id>
<title>Add Visual Studio 2005 and 2008 solution files</title>
<details>We need Visual Studio 2005/2008 project files for Windows.</details>
</issue>
</issues>
Instead of this:
<?xml version="1.0" ?>
<issues>
<issue>
<id>
1
</id>
<title>
Add Visual Studio 2005 and 2008 solution files
</title>
<details>
We need Visual Studio 2005/2008 project files for Windows.
</details>
</issue>
</issues>
Disclaimer: There are probably some limitations.
re.compile
prior to sub
operation (I was using re.findall()
twice, zip
and a for
loop with str.replace()
...) –
Weingartner doc
is a xml.dom.minidom.Document
object) –
Falls As of Python 3.9, ElementTree has an indent()
function for pretty-printing XML trees.
See https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.indent.
Sample usage:
import xml.etree.ElementTree as ET
element = ET.XML("<html><body>text</body></html>")
ET.indent(element)
print(ET.tostring(element, encoding='unicode'))
The upside is that it does not require any additional libraries. For more information check https://bugs.python.org/issue14465 and https://github.com/python/cpython/pull/15200
As others pointed out, lxml has a pretty printer built in.
Be aware though that by default it changes CDATA sections to normal text, which can have nasty results.
Here's a Python function that preserves the input file and only changes the indentation (notice the strip_cdata=False
). Furthermore it makes sure the output uses UTF-8 as encoding instead of the default ASCII (notice the encoding='utf-8'
):
from lxml import etree
def prettyPrintXml(xmlFilePathToPrettyPrint):
assert xmlFilePathToPrettyPrint is not None
parser = etree.XMLParser(resolve_entities=False, strip_cdata=False)
document = etree.parse(xmlFilePathToPrettyPrint, parser)
document.write(xmlFilePathToPrettyPrint, pretty_print=True, encoding='utf-8')
Example usage:
prettyPrintXml('some_folder/some_file.xml')
If you have xmllint
you can spawn a subprocess and use it. xmllint --format <file>
pretty-prints its input XML to standard output.
Note that this method uses an program external to python, which makes it sort of a hack.
def pretty_print_xml(xml):
proc = subprocess.Popen(
['xmllint', '--format', '/dev/stdin'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
(output, error_output) = proc.communicate(xml);
return output
print(pretty_print_xml(data))
I tried to edit "ade"s answer above, but Stack Overflow wouldn't let me edit after I had initially provided feedback anonymously. This is a less buggy version of the function to pretty-print an ElementTree.
def indent(elem, level=0, more_sibs=False):
i = "\n"
if level:
i += (level-1) * ' '
num_kids = len(elem)
if num_kids:
if not elem.text or not elem.text.strip():
elem.text = i + " "
if level:
elem.text += ' '
count = 0
for kid in elem:
indent(kid, level+1, count < num_kids - 1)
count += 1
if not elem.tail or not elem.tail.strip():
elem.tail = i
if more_sibs:
elem.tail += ' '
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i
if more_sibs:
elem.tail += ' '
If you're using a DOM implementation, each has their own form of pretty-printing built-in:
# minidom
#
document.toprettyxml()
# 4DOM
#
xml.dom.ext.PrettyPrint(document, stream)
# pxdom (or other DOM Level 3 LS-compliant imp)
#
serializer.domConfig.setParameter('format-pretty-print', True)
serializer.writeToString(document)
If you're using something else without its own pretty-printer — or those pretty-printers don't quite do it the way you want — you'd probably have to write or subclass your own serialiser.
I had some problems with minidom's pretty print. I'd get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding, eg if I had a β in a document and I tried doc.toprettyxml(encoding='latin-1')
. Here's my workaround for it:
def toprettyxml(doc, encoding):
"""Return a pretty-printed XML document in a given encoding."""
unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
u'<?xml version="1.0" encoding="%s"?>' % encoding)
return unistr.encode(encoding, 'xmlcharrefreplace')
from yattag import indent
pretty_string = indent(ugly_string)
It won't add spaces or newlines inside text nodes, unless you ask for it with:
indent(mystring, indent_text = True)
You can specify what the indentation unit should be and what the newline should look like.
pretty_xml_string = indent(
ugly_xml_string,
indentation = ' ',
newline = '\r\n'
)
The doc is on http://www.yattag.org homepage.
I wrote a solution to walk through an existing ElementTree and use text/tail to indent it as one typically expects.
def prettify(element, indent=' '):
queue = [(0, element)] # (level, element)
while queue:
level, element = queue.pop(0)
children = [(level + 1, child) for child in list(element)]
if children:
element.text = '\n' + indent * (level+1) # for child open
if queue:
element.tail = '\n' + indent * queue[0][0] # for sibling open
else:
element.tail = '\n' + indent * (level-1) # for parent close
queue[0:0] = children # prepend so children come before siblings
Here's a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations.
import xml.etree.ElementTree as ET
import xml.dom.minidom
import os
def pretty_print_xml_given_root(root, output_xml):
"""
Useful for when you are editing xml data on the fly
"""
xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
with open(output_xml, "w") as file_out:
file_out.write(xml_string)
def pretty_print_xml_given_file(input_xml, output_xml):
"""
Useful for when you want to reformat an already existing xml file
"""
tree = ET.parse(input_xml)
root = tree.getroot()
pretty_print_xml_given_root(root, output_xml)
I found how to fix the common newline issue here.
You can use popular external library xmltodict, with unparse
and pretty=True
you will get best result:
xmltodict.unparse(
xmltodict.parse(my_xml), full_document=False, pretty=True)
full_document=False
against <?xml version="1.0" encoding="UTF-8"?>
at the top.
XML pretty print for python looks pretty good for this task. (Appropriately named, too.)
An alternative is to use pyXML, which has a PrettyPrint function.
HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/xmlpp/
Think that project is in the attic nowadays, shame. –
Wore I found this question while looking for "how to pretty print html"
Using some of the ideas in this thread I adapted the XML solutions to work for XML or HTML:
from xml.dom.minidom import parseString as string_to_dom
def prettify(string, html=True):
dom = string_to_dom(string)
ugly = dom.toprettyxml(indent=" ")
split = list(filter(lambda x: len(x.strip()), ugly.split('\n')))
if html:
split = split[1:]
pretty = '\n'.join(split)
return pretty
def pretty_print(html):
print(prettify(html))
When used this is what it looks like:
html = """\
<div class="foo" id="bar"><p>'IDK!'</p><br/><div class='baz'><div>
<span>Hi</span></div></div><p id='blarg'>Try for 2</p>
<div class='baz'>Oh No!</div></div>
"""
pretty_print(html)
Which returns:
<div class="foo" id="bar">
<p>'IDK!'</p>
<br/>
<div class="baz">
<div>
<span>Hi</span>
</div>
</div>
<p id="blarg">Try for 2</p>
<div class="baz">Oh No!</div>
</div>
indent
used in other answers. –
Honeycutt Take a look at the vkbeautify module.
It is a python version of my very popular javascript/nodejs plugin with the same name. It can pretty-print/minify XML, JSON and CSS text. Input and output can be string/file in any combinations. It is very compact and doesn't have any dependency.
Examples:
import vkbeautify as vkb
vkb.xml(text)
vkb.xml(text, 'path/to/dest/file')
vkb.xml('path/to/src/file')
vkb.xml('path/to/src/file', 'path/to/dest/file')
You can try this variation...
Install BeautifulSoup
and the backend lxml
(parser) libraries:
user$ pip3 install lxml bs4
Process your XML document:
from bs4 import BeautifulSoup
with open('/path/to/file.xml', 'r') as doc:
for line in doc:
print(BeautifulSoup(line, 'lxml-xml').prettify())
'lxml'
uses lxml's HTML parser - see the BS4 docs. You need 'xml'
or 'lxml-xml'
for the XML parser. –
Jacintha lxml’s XML parser BeautifulSoup(markup, "lxml-xml") BeautifulSoup(markup, "xml")
–
Doscher lxml-xml
), and then they proceeded to downvote it that same day. I submitted an official complaint to S/O but they refused to investigate. Anyway, I have since "de-tampered" my answer, which is now correct again (and specifies lxml-xml
as it originally did). Thank you. –
Scalise An alternative if you don't want to have to reparse, there is the xmlpp.py library with the get_pprint()
function. It worked nice and smoothly for my use cases, without having to reparse to an lxml ElementTree object.
I found a fast and easy way to nicely format and print an xml file:
import xml.etree.ElementTree as ET
xmlTree = ET.parse('your XML file')
xmlRoot = xmlTree.getroot()
xmlDoc = ET.tostring(xmlRoot, encoding="unicode")
print(xmlDoc)
Outuput:
<root>
<child>
<subchild>.....</subchild>
</child>
<child>
<subchild>.....</subchild>
</child>
...
...
...
<child>
<subchild>.....</subchild>
</child>
</root>
I had this problem and solved it like this:
def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
if pretty_print: pretty_printed_xml = pretty_printed_xml.replace(' ', indent)
file.write(pretty_printed_xml)
In my code this method is called like this:
try:
with open(file_path, 'w') as file:
file.write('<?xml version="1.0" encoding="utf-8" ?>')
# create some xml content using etree ...
xml_parser = XMLParser()
xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')
except IOError:
print("Error while writing in log file!")
This works only because etree by default uses two spaces
to indent, which I don't find very much emphasizing the indentation and therefore not pretty. I couldn't ind any setting for etree or parameter for any function to change the standard etree indent. I like how easy it is to use etree, but this was really annoying me.
For converting an entire xml document to a pretty xml document
(ex: assuming you've extracted [unzipped] a LibreOffice Writer .odt or .ods file, and you want to convert the ugly "content.xml" file to a pretty one for automated git version control and git difftool
ing of .odt/.ods files, such as I'm implementing here)
import xml.dom.minidom
file = open("./content.xml", 'r')
xml_string = file.read()
file.close()
parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()
file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()
References:
- Thanks to Ben Noland's answer on this page which got me most of the way there.
from lxml import etree
import xml.dom.minidom as mmd
xml_root = etree.parse(xml_fiel_path, etree.XMLParser())
def print_xml(xml_root):
plain_xml = etree.tostring(xml_root).decode('utf-8')
urgly_xml = ''.join(plain_xml .split())
good_xml = mmd.parseString(urgly_xml)
print(good_xml.toprettyxml(indent=' ',))
It's working well for the xml with Chinese!
If for some reason you can't get your hands on any of the Python modules that other users mentioned, I suggest the following solution for Python 2.7:
import subprocess
def makePretty(filepath):
cmd = "xmllint --format " + filepath
prettyXML = subprocess.check_output(cmd, shell = True)
with open(filepath, "w") as outfile:
outfile.write(prettyXML)
As far as I know, this solution will work on Unix-based systems that have the xmllint
package installed.
check_output
because you don't need to do error checking –
Newel Use etree.indent
and etree.tostring
import lxml.etree as etree
root = etree.fromstring('<html><head></head><body><h1>Welcome</h1></body></html>')
etree.indent(root, space=" ")
xml_string = etree.tostring(root, pretty_print=True).decode()
print(xml_string)
output
<html>
<head/>
<body>
<h1>Welcome</h1>
</body>
</html>
Removing namespaces and prefixes
import lxml.etree as etree
def dump_xml(element):
for item in element.getiterator():
item.tag = etree.QName(item).localname
etree.cleanup_namespaces(element)
etree.indent(element, space=" ")
result = etree.tostring(element, pretty_print=True).decode()
return result
root = etree.fromstring('<cs:document xmlns:cs="http://blabla.com"><name>hello world</name></cs:document>')
xml_string = dump_xml(root)
print(xml_string)
output
<document>
<name>hello world</name>
</document>
I solved this with some lines of code, opening the file, going trough it and adding indentation, then saving it again. I was working with small xml files, and did not want to add dependencies, or more libraries to install for the user. Anyway, here is what I ended up with:
f = open(file_name,'r')
xml = f.read()
f.close()
#Removing old indendations
raw_xml = ''
for line in xml:
raw_xml += line
xml = raw_xml
new_xml = ''
indent = ' '
deepness = 0
for i in range((len(xml))):
new_xml += xml[i]
if(i<len(xml)-3):
simpleSplit = xml[i:(i+2)] == '><'
advancSplit = xml[i:(i+3)] == '></'
end = xml[i:(i+2)] == '/>'
start = xml[i] == '<'
if(advancSplit):
deepness += -1
new_xml += '\n' + indent*deepness
simpleSplit = False
deepness += -1
if(simpleSplit):
new_xml += '\n' + indent*deepness
if(start):
deepness += 1
if(end):
deepness += -1
f = open(file_name,'w')
f.write(new_xml)
f.close()
It works for me, perhaps someone will have some use of it :)
© 2022 - 2024 — McMap. All rights reserved.