A simplified version of my XML parsing function is here:
import xml.etree.cElementTree as ET
def analyze(xml):
it = ET.iterparse(file(xml))
count = 0
for (ev, el) in it:
count += 1
print('count: {0}'.format(count))
This causes Python to run out of memory, which doesn't make a whole lot of sense. The only thing I am actually storing is the count, an integer. Why is it doing this:
See that sudden drop in memory and CPU usage at the end? That's Python crashing spectacularly. At least it gives me a MemoryError
(depending on what else I am doing in the loop, it gives me more random errors, like an IndexError
) and a stack trace instead of a segfault. But why is it crashing?
.clear()
on each element when you're done with it to save memory. Presumably this works because cElementTree keeps the previously-returned values in memory otherwise. – Rasconlxml
; it has identical (AFAIK) functionality, but is much more memory and time efficient. – Vittlelxml
beatsElementTree
, but notcElementTree
when it comes to parsing. – Satiterparse()
builds the tree. It is up to the caller to delete unwanted elements. – Midshipmite