I'm using Python and BeautifulSoup to parse and access elements from an XML document. I modify the values of a couple of the elements and then write the XML back into the file. The trouble is that the updated XML file contains newlines at the start and end of each XML element's text values, resulting in a file that looks like this:
<annotation>
<folder>
Definitiva
</folder>
<filename>
armas_229.jpg
</filename>
<path>
/tmp/tmpygedczp5/handgun/images/armas_229.jpg
</path>
<size>
<width>
1800
</width>
<height>
1426
</height>
<depth>
3
</depth>
</size>
<segmented>
0
</segmented>
<object>
<name>
handgun
</name>
<pose>
Unspecified
</pose>
<truncated>
0
</truncated>
<difficult>
0
</difficult>
<bndbox>
<xmin>
1001
</xmin>
<ymin>
549
</ymin>
<xmax>
1453
</xmax>
<ymax>
1147
</ymax>
</bndbox>
</object>
</annotation>
Instead I'd rather have the output file look like this:
<annotation>
<folder>Definitiva</folder>
<filename>armas_229.jpg</filename>
<path>/tmp/tmpygedczp5/handgun/images/armas_229.jpg</path>
<size>
<width>1800</width>
<height>1426</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>handgun</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>1001</xmin>
<ymin>549</ymin>
<xmax>1453</xmax>
<ymax>1147</ymax>
</bndbox>
</object>
</annotation>
I open the file and get the "soup" like so:
with open(pascal_xml_file_path) as pascal_file:
pascal_contents = pascal_file.read()
soup = BeautifulSoup(pascal_contents, "xml")
After I've completed modifying a couple of the document's values I rewrite the document back into the file using BeautifulSoup.prettify
like so:
with open(pascal_xml_file_path, "w") as pascal_file:
pascal_file.write(soup.prettify())
My assumption is that the BeautifulSoup.prettify
is adding these superfluous/gratuitous newlines by default, and there doesn't appear to be a good way to modify this behavior. Have I missed something in the BeautifulSoup documentation, or am I truly unable to modify this behavior and need to use another approach for outputting the XML to file? Maybe I'm just better off rewriting this using xml.etree.ElementTree
instead?