Formatting the output as XML with lxml
Asked Answered
A

2

9

My program basically read an input file, makes an lxml.etree from that file, than for example I add a node to the etree and then I want to print it back on a file. So to write it back on a file I use:

et.write('Documents\Write.xml', pretty_print=True)

And the output I have is:

<Variable Name="one" RefID="two"><Component Type="three"><Value>four</Value></Component></Variable>

While I'd like something like:

<Variable Name="one" RefID="two">
    <Component Type="three">
        <Value>four</Value>
    </Component> 
</Variable>

Where am I mistaken? I've tried many solutions but none seems to work (beautifulsoup, tidy, parser...)

Aeroplane answered 18/7, 2013 at 7:52 Comment(7)
could it be windows related? If you try opening your output file with the io module: fp=io.open('Documents\Write.xml', 'w', newline='\r\n') and thenwrite to fp like that et.write(fp, pretty_print=True) (See docs.python.org/2/library/io.html#io.open)Leland
Hi Paul, I'm trying what you says but what's fp? The file I want to write? Sorry I'm a beginner!Aeroplane
Just a file pointer representing the file you want to write, yes. et.write() can take as input a filename or a open file pointer, like something coming from io.open (lxml.de/api/lxml.etree._ElementTree-class.html#write). You can try import io then et.write(io.open('Documents\Write.xml', 'w', newline='\r\n'), pretty_print=True)Leland
Ok so I have done that and I get this error: TypeError: must be unicode, not str...what should I do?Aeroplane
what's the stacktrace? what lines are before the TypeError message?Leland
et.write(fp, pretty_print=True) File "lxml.etree.pyx", line 1916, in lxml.etree._ElementTree.write (src\lxml\lxml.etree.c:51745) File "serializer.pxi", line 482, in lxml.etree._tofilelike (src\lxml\lxml.etree.c:104825) File "lxml.etree.pyx", line 294, in lxml.etree._ExceptionContext._raise_if_stored (src\lxml\lxml.etree.c:9383) File "serializer.pxi", line 398, in lxml.etree._FilelikeWriter.write (src\lxml\lxml.etree.c:103857) TypeError: must be unicode, not strAeroplane
and if you try the other way round? f=io.open('Documents\Write.xml', 'w', newline='\r\n') and f.write(lxml.etree.tostring(et, pretty_print=True))?Leland
S
1

Don't use the standard parser. Use a custom parser with remove_blank_text=True.

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(self.output_file, parser=parser)
# Do stuff with the tree here
tree.write(your_output_file, pretty_print=True)
Sizemore answered 18/8, 2014 at 0:48 Comment(1)
I had the same problem and this worked for me, same answer as here: #7904259Abbott
G
0

That's strange, because it is exactly the way it should work. Could you try this:

root = etree.XML( YOUR XML STRING )
print etree.tostring(root, pretty_print=True)

<Variable Name="one" RefID="two">
  <Component Type="three">
    <Value>four</Value>
  </Component>
</Variable>

This should generate a formatted string, which you can process yourself.

Gillard answered 18/7, 2013 at 8:53 Comment(2)
Thanks for the answer but that is how I do. And in that way it works, but it doesn't when I write on a file..I don't know why! Thanks anyway.Aeroplane
I am also doing exactly that but running into same issue as @AeroplaneClaussen

© 2022 - 2024 — McMap. All rights reserved.