What is a quick way of writing an XML file iteratively (i.e. without having the whole document in memory)? xml.sax.saxutils.XMLGenerator
works but is slow, around 1MB/s on an I7 machine. Here is a test case.
I realize that this question has been asked awhile ago, but, in the mean time, an lxml
API has been introduced that looks promising in terms of addressing the problem: http://lxml.de/api.html ; specifically, refer to the following section: "Incremental XML generation".
I quickly tested it by streaming a 10M file just as in your benchmark, and it took a fraction of a second on my old laptop, which is by no means very scientific, but is quite in the same ballpark as your generate_large_xml()
function.
As Yury V. Zaytsev mentioned, lxml
realy provides API for generating XML documents in streaming manner
Here is working example:
from lxml import etree
fname = "streamed.xml"
with open(fname, "w") as f, etree.xmlfile(f) as xf:
attribs = {"tag": "bagggg", "text": "att text", "published": "now"}
with xf.element("root", attribs):
xf.write("root text\n")
for i in xrange(10):
rec = etree.Element("record", id=str(i))
rec.text = "record text data"
xf.write(rec)
Resulting XML looks like this (the content reformatted from one-line XML doc):
<?xml version="1.0"?>
<root text="att text" tag="bagggg" published="now">root text
<record id="0">record text data</record>
<record id="1">record text data</record>
<record id="2">record text data</record>
<record id="3">record text data</record>
<record id="4">record text data</record>
<record id="5">record text data</record>
<record id="6">record text data</record>
<record id="7">record text data</record>
<record id="8">record text data</record>
<record id="9">record text data</record>
</root>
© 2022 - 2024 — McMap. All rights reserved.