Of course - pretty print of lxml.etree
is possible.
In my case, the old trick with remove_blank_text=True
and pretty_print=True
was not working as I expected (was too delicate), so I decided to write it by myself.
Here is it - a modern, forcible, native pythonic way to correct lxml.etee.Element
tree indentation.
This gives a nicely prettified XML string:
from typing import Optional
import lxml.etree
def indent_lxml(element: lxml.etree.Element, level: int = 0, is_last_child: bool = True) -> None:
space = " "
indent_str = "\n" + level * space
element.text = strip_or_null(element.text)
if element.text:
element.text = f"{indent_str}{space}{element.text}"
num_children = len(element)
if num_children:
element.text = f"{element.text or ''}{indent_str}{space}"
for index, child in enumerate(element.iterchildren()):
is_last = index == num_children - 1
indent_lxml(child, level + 1, is_last)
elif element.text:
element.text += indent_str
tail_level = max(0, level - 1) if is_last_child else level
tail_indent = "\n" + tail_level * space
tail = strip_or_null(element.tail)
element.tail = f"{indent_str}{tail}{tail_indent}" if tail else tail_indent
def strip_or_null(text: Optional[str]) -> Optional[str]:
if text is not None:
return text.strip() or None
It's decent fast, because it doesn't allocate any additional structures in memory and also traversing the tree - it visits each node only once, giving the best possible - O x N
computational complexity.
It rearranges all the existing indentation "in place" in the tree (the DOM) by correcting contents of Element.text
and Element.tail
attributes (affects white-spaces only).
Naturally, it also can be used with HTML
parsed by lxml
.
In order to use it, do something like that:
root = lxml.etree.parse("path/to/the_file.xml").getroot()
# or
root = lxml.etree.fromstring("<xml><body><leaf1/><leaf2/></body></xml>")
indent_lxml(root) # corrects indentation "in place"
result = lxml.etree.tostring(root, encoding="unicode")
print(result)
Which prints:
<xml>
<body>
<leaf1/>
<leaf2/>
</body>
</xml>
indent()
function since lxml 4.5.0. https://mcmap.net/q/425812/-changing-the-default-indentation-of-etree-tostring-in-lxml – Rodomontade