Pretty print in lxml is failing when I add tags to a parsed tree
Asked Answered
A

2

40

I have an xml file that I'm using etree from lxml to work with, but when I add tags to it, pretty printing doesn't seem to work.

>>> from lxml import etree
>>> root = etree.parse('file.xml').getroot()
>>> print etree.tostring(root, pretty_print = True)

<root>
  <x>
    <y>test1</y>
  </x>
</root>

So far so good. But now

>>> x = root.find('x')
>>> z = etree.SubElement(x, 'z')
>>> etree.SubElement(z, 'z1').attrib['value'] = 'val1'
>>> print etree.tostring(root, pretty_print = True)

<root>
  <x>
    <y>test1</y>
  <z><z1 value="val1"/></z></x>
</root>

it's no longer pretty. I've also tried to do it "backwards" where I create the z1 tag, then create the z tag and append z1 to it, then append the z tag to the x tag. But I get the same result.

If I don't parse the file and just create all the tags in one go, it'll print correctly. So I think it has something to do with parsing the file.

How can I get pretty printing to work?

Atomy answered 26/10, 2011 at 14:2 Comment(0)
L
64

It has to do with how lxml treats whitespace -- see the lxml FAQ for details.

To fix this, change the loading part of the file to the following:

parser = etree.XMLParser(remove_blank_text=True)
root = etree.parse('file.xml', parser).getroot()

I didn't test it, but it should indent your file just fine with this change.

Lampyrid answered 26/10, 2011 at 14:22 Comment(4)
Ah nice, this seems to be working. I had seen that FAQ question, but I guess I read it wrong. I thought it was saying that it should only matter if you have text data with whitespace in it, which I didn't. But I guess it matters if you have any elements with text data at all. Thanks for the quick fix.Atomy
Cool! helped in my case!Noose
If you can use Python 3.9, there's a new function xml.etree.ElementTree.indent that can also help address this without the need for any lxml dependencies.Adiana
This solution does NOT work if you add new elements in the element tree.Assorted
A
2

I was having the same issue when writing to files, for anyone else with this issue:

I created a helper function that pretty_prints after I run my main function.

from lxml import etree

def ppxml(xml):
    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.parse(xml, parser)
    tree.write(xml, encoding='utf-8', pretty_print=True, xml_declaration=True)

In in my main program file

if __name__ == '__main__':
    main()
    ppxml(xml)
Antipodes answered 1/10, 2018 at 9:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.