Pretty printing XML in Python
Asked Answered
D

27

536

What is the best way (or are the various ways) to pretty print XML in Python?

Desorb answered 15/4, 2009 at 0:5 Comment(0)
M
468
import xml.dom.minidom

dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()
Myrmidon answered 30/7, 2009 at 14:12 Comment(19)
This will get you pretty xml, but note that what comes out in the text node is actually different than what came in - there are new whitespaces on text nodes. This may cause you trouble if you are expecting EXACTLY what fed in to feed out.Nitid
@icnivad: while it is important to point that fact, it seems strange to me that somebody would want to prettify its XML if spaces were of some importance for them !Naylor
Nice! Can collapse this to a one liner: python -c 'import sys;import xml.dom.minidom;s=sys.stdin.read();print xml.dom.minidom.parseString(s).toprettyxml()'Pauiie
minidom is widely panned as a pretty bad xml implementation. If you allow yourself to add external depenencies, lxml is far superior.Shari
Not a fan of redefining xml there from being a module to the output object, but the method otherwise works. I'd love to find a nicer way to go from the core etree to pretty printing. While lxml is cool, there are times when I'd prefer to keep to the core if I can.Tavy
This is inserting additional whitespace every time I run it, which is broken, in my opinion. (Python 2.6 and 2.7)Tavish
Link to the docs: docs.python.org/2/library/…Ictus
Tons of crazy blank lines everywhere. This solution doesn't work.Ropedancer
Yeah, this is a pretty terrible solution actually.Sciurine
with 2.6, there are newlines and spaces within the text nodes, but with 2.7, the text nodes appear unchanged.Horseman
Thanks @Wyrmwood, that was really helpful for me to know.Octavo
It's worth noting that this is still some pretty butt-ugly XML: there doesn't seem to be a reliable way to prettify XML in a way that isn't butt-ugly, and by butt-ugly, I mean that all of a tags attributes are on the same line instead of being on separate lines.Richierichlad
Awesome work! Here's an example which takes it one step further and converts an ugly xml file to a pretty xml file: https://mcmap.net/q/25555/-pretty-printing-xml-in-pythonWirra
xml.toprettyxml(indent="", newl="") this to have a 4 indent spaces and no newline everywhere. add one-liner: python -c 'import sys, xml.dom.minidom as xmld; print(xmld.parse(sys.argv[1]).toprettyxml(indent="",newl=""))' my_xml_file.xmlMelendez
To remove the ugly empty lines it generate just add something like this to strip the blank lines: pretty_xml_as_string = '\n'.join(list(filter(lambda x: len(x.strip()), pretty_xml_as_string.split('\n'))))Copy
@Treviño I see no difference after adding the your suggestion.Cadell
@Horseman I am running 2.7.2 and see the newline and spaces.Cadell
@Cadell I'm sorry. I do not know the exact minor revision where this was fixed, but given your experience, likely it was later than 2.7.2. You can't easily install those older versions, unless you build from source, and even then it can be difficult to impossible to get setup tools and pip working previous to about 2.7.10 or so, but perhaps that's what shipped with your distro. Take the leap and go Python 3! pythonclock.orgHorseman
If you can use Python 3.9, there's a new function xml.etree.ElementTree.indent that can also help address this without the need for any lxml dependencies or minidom hacks.Nannie
R
195

lxml is recent, updated, and includes a pretty print function

import lxml.etree as etree

x = etree.parse("filename")
print etree.tostring(x, pretty_print=True)

Check out the lxml tutorial: https://lxml.de/tutorial.html

Ran answered 15/4, 2009 at 0:21 Comment(10)
Only downside to lxml is a dependency on external libraries. This I think is not so bad under Windows the libraries are packaged with the module. Under linux they are an aptitude install away. Under OS/X I'm not sure.Steno
On OS X you just need a functioning gcc and easy_install/pip.Morass
lxml pretty printer isn't reliable and won't pretty print your XML properly in lots of cases explained in lxml FAQ. I quit using lxml for pretty printing after several corner cases that just don't work (ie this won't fix: Bug #910018). All these problem is related to uses of XML values containing spaces that should be preserved.Naylor
This is the industry standard.Shari
lxml is also part of MacPorts, works smoothly for me.Danettedaney
Since in Python 3 you usually want to work with str (=unicode string in Python 2), better use this: print(etree.tostring(x, pretty_print=True, encoding="unicode")). Writing to an output file is possible in just one line, no intermediary variable needed: etree.parse("filename").write("outputfile", encoding="utf-8")Sadi
etree.XMLParser(remove_blank_text=True) sometime can help to do the right printingEpizoic
lxml fails to pretty print my XML, even if I load the original document with remove_blank_text=True.Sweetheart
Like Thor, with Python 3 I had to use etree.tostring(x, pretty_print=True, encoding="unicode"); but I then also had to use pretty_str = pretty_str.replace('\\n', '\n')Falls
If you can use Python 3.9, there's a new function xml.etree.ElementTree.indent that can also help without the need for any lxml dependencies.Nannie
P
123

Another solution is to borrow this indent function, for use with the ElementTree library that's built in to Python since 2.5. Here's what that would look like:

from xml.etree import ElementTree

def indent(elem, level=0):
    i = "\n" + level*"  "
    j = "\n" + (level-1)*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for subelem in elem:
            indent(subelem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = j
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = j
    return elem        

root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)
Picador answered 4/1, 2011 at 1:57 Comment(7)
...and then just use lxml tostring!Hollie
Note that you can still do tree.write([filename]) for writing to file (tree being the ElementTree instance).Roast
No you can't since elementtree.getroot() doesn't have that method, only an elementtree object has it. @boukePineapple
Here's how you can write to a file: tree = ElementTree.parse('file) ; root = tree.getroot() ; indent(root); tree.write('Out.xml');Lietman
@AylwynLake can anyone provide a minimal example where this code goes wrong, and the eff one goes right? Then we can start testing and put the best code here.Bank
This keeps bombing out on me, I get TypeError: object of type 'ElementTree' has no len() when I run this. If I use tree.write before passing tree to indent, I get an xml file with everything on one line.Quach
Links to the correct code are dead, but you can still find the answer in this question's edit historyXylograph
F
87

You have a few options.

If you are using Python 3.9+, your simplest option is:

xml.etree.ElementTree.indent()

Batteries included and pretty output.

Sample code:

import xml.etree.ElementTree as ET

element = ET.XML("<html><body>text</body></html>")
ET.indent(element)
print(ET.tostring(element, encoding='unicode'))

BeautifulSoup.prettify()

BeautifulSoup may be the simplest solution for Python < 3.9.

from bs4 import BeautifulSoup

bs = BeautifulSoup(open(xml_file), 'xml')
pretty_xml = bs.prettify()
print(pretty_xml)

Output:

<?xml version="1.0" encoding="utf-8"?>
<issues>
 <issue>
  <id>
   1
  </id>
  <title>
   Add Visual Studio 2005 and 2008 solution files
  </title>
 </issue>
</issues>

This is my goto answer. The default arguments work as is. But text contents are spread out on separate lines as if they were nested elements.

lxml.etree.parse()

Prettier output but with arguments.

from lxml import etree

x = etree.parse(FILE_NAME)
pretty_xml = etree.tostring(x, pretty_print=True, encoding=str)

Produces:

  <issues>
    <issue>
      <id>1</id>
      <title>Add Visual Studio 2005 and 2008 solution files</title>
      <details>We need Visual Studio 2005/2008 project files for Windows.</details>
    </issue>
  </issues>

This works for me with no issues.


xml.dom.minidom.parse()

No external dependencies but post-processing.

import xml.dom.minidom as md

dom = md.parse(FILE_NAME)     
# To parse string instead use: dom = md.parseString(xml_string)
pretty_xml = dom.toprettyxml()
# remove the weird newline issue:
pretty_xml = os.linesep.join([s for s in pretty_xml.splitlines()
                              if s.strip()])

The output is the same as above, but it's more code.

Feeney answered 14/9, 2016 at 4:54 Comment(4)
Getting this error message: bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: xml. Do you need to install a parser library?Tonyatonye
You need to run python3 -m pip install --user lxmlGermany
Good job man :) for remove the weird newline issue ! tyIntercommunion
first solution in this answer should go above in all answersTyra
P
49

Here's my (hacky?) solution to get around the ugly text node problem.

uglyXml = doc.toprettyxml(indent='  ')

text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)    
prettyXml = text_re.sub('>\g<1></', uglyXml)

print prettyXml

The above code will produce:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>1</id>
    <title>Add Visual Studio 2005 and 2008 solution files</title>
    <details>We need Visual Studio 2005/2008 project files for Windows.</details>
  </issue>
</issues>

Instead of this:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>
      1
    </id>
    <title>
      Add Visual Studio 2005 and 2008 solution files
    </title>
    <details>
      We need Visual Studio 2005/2008 project files for Windows.
    </details>
  </issue>
</issues>

Disclaimer: There are probably some limitations.

Participation answered 29/7, 2010 at 22:18 Comment(7)
Thank you! This was my one gripe with all the pretty printing methods. Works well with the few files I tried.Turquoise
I found a pretty 'almost identical' solution, but yours is more direct, using re.compile prior to sub operation (I was using re.findall() twice, zip and a for loop with str.replace()...)Weingartner
This is no longer necessary in Python 2.7: xml.dom.minidom's toprettyxml() now produces output like '<id>1</id>' by default, for nodes that have exactly one text child node.Rochellrochella
I am compelled to use Python 2.6. So, this regex reformatting trick is very useful. Worked as-is with no problems.Squamulose
@Marius Gedminas I am running 2.7.2 and the "default" is definitely not as you say.Cadell
(FYI, doc is a xml.dom.minidom.Document object)Falls
@Nick Bolton : Can we increase the spacing/indentation ?Ariadne
T
28

As of Python 3.9, ElementTree has an indent() function for pretty-printing XML trees.

See https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.indent.

Sample usage:

import xml.etree.ElementTree as ET

element = ET.XML("<html><body>text</body></html>")
ET.indent(element)
print(ET.tostring(element, encoding='unicode'))

The upside is that it does not require any additional libraries. For more information check https://bugs.python.org/issue14465 and https://github.com/python/cpython/pull/15200

Ticking answered 12/8, 2020 at 9:26 Comment(0)
T
23

As others pointed out, lxml has a pretty printer built in.

Be aware though that by default it changes CDATA sections to normal text, which can have nasty results.

Here's a Python function that preserves the input file and only changes the indentation (notice the strip_cdata=False). Furthermore it makes sure the output uses UTF-8 as encoding instead of the default ASCII (notice the encoding='utf-8'):

from lxml import etree

def prettyPrintXml(xmlFilePathToPrettyPrint):
    assert xmlFilePathToPrettyPrint is not None
    parser = etree.XMLParser(resolve_entities=False, strip_cdata=False)
    document = etree.parse(xmlFilePathToPrettyPrint, parser)
    document.write(xmlFilePathToPrettyPrint, pretty_print=True, encoding='utf-8')

Example usage:

prettyPrintXml('some_folder/some_file.xml')
Titfer answered 13/4, 2011 at 12:33 Comment(1)
It's a little late now. But I think lxml fixed CDATA? CDATA is CDATA on my side.Anthozoan
D
13

If you have xmllint you can spawn a subprocess and use it. xmllint --format <file> pretty-prints its input XML to standard output.

Note that this method uses an program external to python, which makes it sort of a hack.

def pretty_print_xml(xml):
    proc = subprocess.Popen(
        ['xmllint', '--format', '/dev/stdin'],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
    )
    (output, error_output) = proc.communicate(xml);
    return output

print(pretty_print_xml(data))
Dregs answered 12/4, 2012 at 23:40 Comment(0)
G
12

I tried to edit "ade"s answer above, but Stack Overflow wouldn't let me edit after I had initially provided feedback anonymously. This is a less buggy version of the function to pretty-print an ElementTree.

def indent(elem, level=0, more_sibs=False):
    i = "\n"
    if level:
        i += (level-1) * '  '
    num_kids = len(elem)
    if num_kids:
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
            if level:
                elem.text += '  '
        count = 0
        for kid in elem:
            indent(kid, level+1, count < num_kids - 1)
            count += 1
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
            if more_sibs:
                elem.tail += '  '
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i
            if more_sibs:
                elem.tail += '  '
Genitalia answered 17/10, 2012 at 17:32 Comment(0)
S
10

If you're using a DOM implementation, each has their own form of pretty-printing built-in:

# minidom
#
document.toprettyxml()

# 4DOM
#
xml.dom.ext.PrettyPrint(document, stream)

# pxdom (or other DOM Level 3 LS-compliant imp)
#
serializer.domConfig.setParameter('format-pretty-print', True)
serializer.writeToString(document)

If you're using something else without its own pretty-printer — or those pretty-printers don't quite do it the way you want —  you'd probably have to write or subclass your own serialiser.

Serene answered 15/4, 2009 at 0:48 Comment(0)
H
8

I had some problems with minidom's pretty print. I'd get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding, eg if I had a β in a document and I tried doc.toprettyxml(encoding='latin-1'). Here's my workaround for it:

def toprettyxml(doc, encoding):
    """Return a pretty-printed XML document in a given encoding."""
    unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
                          u'<?xml version="1.0" encoding="%s"?>' % encoding)
    return unistr.encode(encoding, 'xmlcharrefreplace')
Heckelphone answered 15/4, 2009 at 13:46 Comment(0)
E
6
from yattag import indent

pretty_string = indent(ugly_string)

It won't add spaces or newlines inside text nodes, unless you ask for it with:

indent(mystring, indent_text = True)

You can specify what the indentation unit should be and what the newline should look like.

pretty_xml_string = indent(
    ugly_xml_string,
    indentation = '    ',
    newline = '\r\n'
)

The doc is on http://www.yattag.org homepage.

Embroidery answered 13/5, 2014 at 14:49 Comment(0)
M
6

I wrote a solution to walk through an existing ElementTree and use text/tail to indent it as one typically expects.

def prettify(element, indent='  '):
    queue = [(0, element)]  # (level, element)
    while queue:
        level, element = queue.pop(0)
        children = [(level + 1, child) for child in list(element)]
        if children:
            element.text = '\n' + indent * (level+1)  # for child open
        if queue:
            element.tail = '\n' + indent * queue[0][0]  # for sibling open
        else:
            element.tail = '\n' + indent * (level-1)  # for parent close
        queue[0:0] = children  # prepend so children come before siblings
Marinna answered 25/7, 2016 at 17:25 Comment(0)
A
5

Here's a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations.

import xml.etree.ElementTree as ET
import xml.dom.minidom
import os

def pretty_print_xml_given_root(root, output_xml):
    """
    Useful for when you are editing xml data on the fly
    """
    xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
    xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
    with open(output_xml, "w") as file_out:
        file_out.write(xml_string)

def pretty_print_xml_given_file(input_xml, output_xml):
    """
    Useful for when you want to reformat an already existing xml file
    """
    tree = ET.parse(input_xml)
    root = tree.getroot()
    pretty_print_xml_given_root(root, output_xml)

I found how to fix the common newline issue here.

Adactylous answered 12/2, 2020 at 17:3 Comment(0)
O
4

You can use popular external library xmltodict, with unparse and pretty=True you will get best result:

xmltodict.unparse(
    xmltodict.parse(my_xml), full_document=False, pretty=True)

full_document=False against <?xml version="1.0" encoding="UTF-8"?> at the top.

Ormolu answered 7/9, 2016 at 17:2 Comment(0)
H
3

XML pretty print for python looks pretty good for this task. (Appropriately named, too.)

An alternative is to use pyXML, which has a PrettyPrint function.

Hundredfold answered 15/4, 2009 at 0:7 Comment(2)
HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/xmlpp/ Think that project is in the attic nowadays, shame.Wore
Obsolete answer, dead links. PyXML was abandoned years ago.Hora
T
3

I found this question while looking for "how to pretty print html"

Using some of the ideas in this thread I adapted the XML solutions to work for XML or HTML:

from xml.dom.minidom import parseString as string_to_dom

def prettify(string, html=True):
    dom = string_to_dom(string)
    ugly = dom.toprettyxml(indent="  ")
    split = list(filter(lambda x: len(x.strip()), ugly.split('\n')))
    if html:
        split = split[1:]
    pretty = '\n'.join(split)
    return pretty

def pretty_print(html):
    print(prettify(html))

When used this is what it looks like:

html = """\
<div class="foo" id="bar"><p>'IDK!'</p><br/><div class='baz'><div>
<span>Hi</span></div></div><p id='blarg'>Try for 2</p>
<div class='baz'>Oh No!</div></div>
"""

pretty_print(html)

Which returns:

<div class="foo" id="bar">
  <p>'IDK!'</p>
  <br/>
  <div class="baz">
    <div>
      <span>Hi</span>
    </div>
  </div>
  <p id="blarg">Try for 2</p>
  <div class="baz">Oh No!</div>
</div>
Tijerina answered 1/10, 2020 at 20:1 Comment(1)
Works in Python 3.8, which doesn't support indent used in other answers.Honeycutt
N
2

Take a look at the vkbeautify module.

It is a python version of my very popular javascript/nodejs plugin with the same name. It can pretty-print/minify XML, JSON and CSS text. Input and output can be string/file in any combinations. It is very compact and doesn't have any dependency.

Examples:

import vkbeautify as vkb

vkb.xml(text)                       
vkb.xml(text, 'path/to/dest/file')  
vkb.xml('path/to/src/file')        
vkb.xml('path/to/src/file', 'path/to/dest/file') 
Neumann answered 4/1, 2017 at 1:50 Comment(1)
This particular library handles the Ugly Text Node problem.Duckworth
S
2

You can try this variation...

Install BeautifulSoup and the backend lxml (parser) libraries:

user$ pip3 install lxml bs4

Process your XML document:

from bs4 import BeautifulSoup

with open('/path/to/file.xml', 'r') as doc: 
    for line in doc: 
        print(BeautifulSoup(line, 'lxml-xml').prettify())  
Scalise answered 29/9, 2019 at 20:49 Comment(4)
'lxml' uses lxml's HTML parser - see the BS4 docs. You need 'xml' or 'lxml-xml' for the XML parser.Jacintha
This comment keeps getting deleted. Again, I've enter a formal complaint (in addition to) 4-flags) of post tampering with StackOverflow, and will not stop until this is forensically investigated by a security team (access logs and version histories). The above timestamp is wrong (by years) and likely the content, too.Scalise
This worked fine for me, unsure of the down vote from the docs lxml’s XML parser BeautifulSoup(markup, "lxml-xml") BeautifulSoup(markup, "xml")Doscher
@Datanovice I'm glad it helped you. :) As for the suspect downvote, someone tampered with my original answer (which correctly originally specified lxml-xml), and then they proceeded to downvote it that same day. I submitted an official complaint to S/O but they refused to investigate. Anyway, I have since "de-tampered" my answer, which is now correct again (and specifies lxml-xml as it originally did). Thank you.Scalise
D
1

An alternative if you don't want to have to reparse, there is the xmlpp.py library with the get_pprint() function. It worked nice and smoothly for my use cases, without having to reparse to an lxml ElementTree object.

Duplication answered 25/7, 2017 at 14:38 Comment(3)
Tried minidom and lxml and didn't get a properly formatted and indented xml. This worked as expectedHowzell
Fails for tag names that are prefixed by a namespace and contain a hyphen (e.g. <ns:hyphenated-tag/>; the part starting with the hyphen is simply dropped, giving e.g. <ns:hyphenated/>.Airlia
@EndreBoth Nice catch, I did not test, but maybe it would be easy to fix this in the xmlpp.py code?Duplication
J
1

I found a fast and easy way to nicely format and print an xml file:

import xml.etree.ElementTree as ET

xmlTree = ET.parse('your XML file')
xmlRoot = xmlTree.getroot()
xmlDoc =  ET.tostring(xmlRoot, encoding="unicode")

print(xmlDoc)

Outuput:

<root>
  <child>
    <subchild>.....</subchild>
  </child>
  <child>
    <subchild>.....</subchild>
  </child>
  ...
  ...
  ...
  <child>
    <subchild>.....</subchild>
  </child>
</root>
Jerid answered 17/5, 2021 at 13:20 Comment(0)
F
0

I had this problem and solved it like this:

def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
    pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
    if pretty_print: pretty_printed_xml = pretty_printed_xml.replace('  ', indent)
    file.write(pretty_printed_xml)

In my code this method is called like this:

try:
    with open(file_path, 'w') as file:
        file.write('<?xml version="1.0" encoding="utf-8" ?>')

        # create some xml content using etree ...

        xml_parser = XMLParser()
        xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')

except IOError:
    print("Error while writing in log file!")

This works only because etree by default uses two spaces to indent, which I don't find very much emphasizing the indentation and therefore not pretty. I couldn't ind any setting for etree or parameter for any function to change the standard etree indent. I like how easy it is to use etree, but this was really annoying me.

Feed answered 27/7, 2015 at 23:6 Comment(0)
W
0

For converting an entire xml document to a pretty xml document
(ex: assuming you've extracted [unzipped] a LibreOffice Writer .odt or .ods file, and you want to convert the ugly "content.xml" file to a pretty one for automated git version control and git difftooling of .odt/.ods files, such as I'm implementing here)

import xml.dom.minidom

file = open("./content.xml", 'r')
xml_string = file.read()
file.close()

parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()

file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()

References:

Wirra answered 1/9, 2018 at 6:48 Comment(0)
C
0
from lxml import etree
import xml.dom.minidom as mmd

xml_root = etree.parse(xml_fiel_path, etree.XMLParser())

def print_xml(xml_root):
    plain_xml = etree.tostring(xml_root).decode('utf-8')
    urgly_xml = ''.join(plain_xml .split())
    good_xml = mmd.parseString(urgly_xml)
    print(good_xml.toprettyxml(indent='    ',))

It's working well for the xml with Chinese!

Cheloid answered 12/8, 2019 at 9:24 Comment(0)
N
0

If for some reason you can't get your hands on any of the Python modules that other users mentioned, I suggest the following solution for Python 2.7:

import subprocess

def makePretty(filepath):
  cmd = "xmllint --format " + filepath
  prettyXML = subprocess.check_output(cmd, shell = True)
  with open(filepath, "w") as outfile:
    outfile.write(prettyXML)

As far as I know, this solution will work on Unix-based systems that have the xmllint package installed.

Newel answered 14/5, 2020 at 4:16 Comment(2)
xmllint has already been suggested in another answer: https://mcmap.net/q/25555/-pretty-printing-xml-in-pythonHora
@Hora I saw the answer, but I simplified mine down to check_output because you don't need to do error checkingNewel
A
0

Use etree.indent and etree.tostring

import lxml.etree as etree

root = etree.fromstring('<html><head></head><body><h1>Welcome</h1></body></html>')
etree.indent(root, space="  ")
xml_string = etree.tostring(root, pretty_print=True).decode()
print(xml_string)

output

<html>
  <head/>
  <body>
    <h1>Welcome</h1>
  </body>
</html>

Removing namespaces and prefixes

import lxml.etree as etree


def dump_xml(element):
    for item in element.getiterator():
        item.tag = etree.QName(item).localname

    etree.cleanup_namespaces(element)
    etree.indent(element, space="  ")
    result = etree.tostring(element, pretty_print=True).decode()
    return result


root = etree.fromstring('<cs:document xmlns:cs="http://blabla.com"><name>hello world</name></cs:document>')
xml_string = dump_xml(root)
print(xml_string)

output

<document>
  <name>hello world</name>
</document>
Austenite answered 8/10, 2020 at 21:1 Comment(0)
S
-2

I solved this with some lines of code, opening the file, going trough it and adding indentation, then saving it again. I was working with small xml files, and did not want to add dependencies, or more libraries to install for the user. Anyway, here is what I ended up with:

f = open(file_name,'r')
xml = f.read()
f.close()

#Removing old indendations
raw_xml = ''        
for line in xml:
    raw_xml += line

xml = raw_xml

new_xml = ''
indent = '    '
deepness = 0

for i in range((len(xml))):

    new_xml += xml[i]   
    if(i<len(xml)-3):

        simpleSplit = xml[i:(i+2)] == '><'
        advancSplit = xml[i:(i+3)] == '></'        
        end = xml[i:(i+2)] == '/>'    
        start = xml[i] == '<'

        if(advancSplit):
            deepness += -1
            new_xml += '\n' + indent*deepness
            simpleSplit = False
            deepness += -1
        if(simpleSplit):
            new_xml += '\n' + indent*deepness
        if(start):
            deepness += 1
        if(end):
            deepness += -1

f = open(file_name,'w')
f.write(new_xml)
f.close()

It works for me, perhaps someone will have some use of it :)

Sotted answered 12/7, 2013 at 11:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.