Changing the default indentation of etree.tostring in lxml
Asked Answered
A

4

16

I have an XML document which I'm pretty-printing using lxml.etree.tostring

print etree.tostring(doc, pretty_print=True)

The default level of indentation is 2 spaces, and I'd like to change this to 4 spaces. There isn't any argument for this in the tostring function; is there a way to do this easily with lxml?

Arcuation answered 6/8, 2009 at 13:47 Comment(0)
S
5

As said in this thread, there is no real way to change the indent of the lxml.etree.tostring pretty-print.

But, you can:

  • add a XSLT transform to change the indent
  • add whitespace to the tree, with something like in the cElementTree library

code:

def indent(elem, level=0):
    i = "\n" + level*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for elem in elem:
            indent(elem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i
Spoonbill answered 6/8, 2009 at 14:20 Comment(3)
The link to lxml-dev list archives from Feb 2009 is broken, it's at: mailman-mail5.webfaction.com/pipermail/lxml/20090208/… . But anyway, kludging whitespace into actual tree elements seems nasty, haven't people asked for this as an enhance?Torchwood
@smci: There is a built-in indent() function since lxml 4.5 (released 2020-01-29). See the answer from @kuch.Tortilla
The mail list has changed the location again, currently located here: mail.python.org/archives/list/[email protected]/thread/…Boser
B
14

Since version 4.5, you can set indent size using indent() function.

etree.indent(root, space="    ")
print(etree.tostring(root))
Balcom answered 1/9, 2020 at 10:27 Comment(1)
Note that this sometimes behaves differently than the pretty_print option. Try to indent <a><b>Some <c>mixed</c> <c>content</c>.</b></a> to see the difference.Boser
S
5

As said in this thread, there is no real way to change the indent of the lxml.etree.tostring pretty-print.

But, you can:

  • add a XSLT transform to change the indent
  • add whitespace to the tree, with something like in the cElementTree library

code:

def indent(elem, level=0):
    i = "\n" + level*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for elem in elem:
            indent(elem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i
Spoonbill answered 6/8, 2009 at 14:20 Comment(3)
The link to lxml-dev list archives from Feb 2009 is broken, it's at: mailman-mail5.webfaction.com/pipermail/lxml/20090208/… . But anyway, kludging whitespace into actual tree elements seems nasty, haven't people asked for this as an enhance?Torchwood
@smci: There is a built-in indent() function since lxml 4.5 (released 2020-01-29). See the answer from @kuch.Tortilla
The mail list has changed the location again, currently located here: mail.python.org/archives/list/[email protected]/thread/…Boser
B
2

This can be easily done, using XMLParser and indent. There is no need for pretty_print :

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse('myfile.xml',parser) 
etree.indent(tree, space="    ")
tree.write('myfile.xml', encoding='UTF-8')
Buckra answered 21/2, 2023 at 11:23 Comment(0)
K
0

You may check this solution. Changing the space value allows you to get any indent you want. It can be different amount of spaces or tab "\t" character(s).

Kilbride answered 16/7, 2022 at 20:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.