iterparse is throwing 'no element found: line 1, column 0' and I'm not sure why
Asked Answered
F

2

2

I have a network application (using Twisted) that receives chunks of xml (as in the entire xml may not come in its entirety in a single packet) over the internet. My thought process is to slowly build the xml message as it's received. I've "settled" on iterparse from xml.etree.ElementTree. I've been dabbling in some code and the following (non-Twisted code) works fine:

import xml.etree.ElementTree as etree
from io import StringIO

buff = StringIO(unicode('<notorious><burger/></notorious>'))

for event, elem in etree.iterparse(buff, events=('end',)):
    if elem.tag == 'notorious':
        print(etree.tostring(elem))

Then I built the following code to simulate how data may be received on my end:

import xml.etree.ElementTree as etree
from io import StringIO

chunks = ['<notorious>','<burger/>','</notorious>']
buff = StringIO()

for ch in chunks:
    buff.write(unicode(ch))
    if buff.getvalue() == '<notorious><burger/></notorious>':
        print("it should work now")
    try:
        for event, elem in etree.iterparse(buff, events=('end',)):
            if elem.tag == 'notorious':
                print(etree.tostring(elem))
        except Exception as e:
            print(e)

But the code spits out:

'no element found: line 1, column 0'

I can't wrap my head around it. Why does that error occur when the stringIO from the 2nd sample has the same contents of the stringIO in the first code sample?

ps:

  1. I know I'm not the first to ask this but no other thread answered my question. If I'm wrong, plz provide the appropriate thread.
  2. If you have suggestions for other modules to use, don't put them in the answer plz. Add a comment.

Thanks

Foretaste answered 5/12, 2014 at 1:32 Comment(3)
Twisted already contains some stream-parsing XML stuff in twisted.words for parsing XMPP. You might want to have a look at twistedmatrix.com/documents/current/api/…Bluecoat
I knew I was trying to reinvent the wheel. I need to find a better way to parse all the documents on the Twisted site before I post a question here. Live and learnForetaste
Holy sweet baby jeebus XmlStream where have you been all my life :D thanks Glyph! The XmlStream is the way to go for streaming XML.Foretaste
P
3

File objects and file-like objects have a file position. Once it's read / written, the file position advance. You need to change the file position (using <file_object>.seek(..)) before pass the file object to etree.iterparse so that it can read from the beginning of the file.

...
buff.seek(0) # <-----
for event, elem in etree.iterparse(buff, events=('end',)):
    if elem.tag == 'notorious':
        print(etree.tostring(elem))
Pneumodynamics answered 5/12, 2014 at 2:53 Comment(1)
I see I was not aware that the io position was "moved" (I kinda thought that but didn't think to seek(0) the buffer). Thank you. I will accept your answer as it does satisfy the requirement in my question, but for all with similar issues with streaming XML + Twisted use XmlStream (twisted.words.xish.xmlstream)Foretaste
S
1

Even though after you have written you closed the file, the file position point to the last pos. So you have to move the file pos using seek command fd.seek(0) Now you can use et.parse command to open and parse the file.

Succotash answered 20/8, 2017 at 19:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.