SAX IncrementalParser in Jython
Asked Answered
E

2

10

Python standard library provides xml.sax.xmlreader.IncrementalParser interface which has feed() method. Jython also provides xml.sax package that uses Java SAX parser implementation under the hood, but it seems not to provide IncrementalParser.

Is there any way to incrementally parse chunks of XML in Jython? At the first glance I thought it can be achieved using coroutine like greenlet, but I immediately realized that it can’t be used in Jython.

Essentiality answered 16/10, 2013 at 8:7 Comment(0)
C
3

You can use StAX. The StAX parser streams like SAX but maintains a cursor and allows you to extract content at the cursor by using hasNext() and next().

The following code is adapted from this java example. Note this is my first attempt ever with jython, so don't hang me if I did something unconventionally, but the example works.

http://www.javacodegeeks.com/2013/05/parsing-xml-using-dom-sax-and-stax-parser-in-java.html

from javax.xml.stream import XMLStreamConstants, XMLInputFactory, XMLStreamReader
from java.io import ByteArrayInputStream;
from java.lang import String

xml = String(
"""<?xml version="1.0" encoding="ISO-8859-1"?>
<employees>
  <employee id="111">
    <firstName>Rakesh</firstName>
    <lastName>Mishra</lastName>
    <location>Bangalore</location>
  </employee>
  <employee id="112">
    <firstName>John</firstName>
    <lastName>Davis</lastName>
    <location>Chennai</location>
  </employee>
  <employee id="113">
    <firstName>Rajesh</firstName>
    <lastName>Sharma</lastName>
    <location>Pune</location>
  </employee>
</employees>
""")

class Employee:
    id = None
    firstName = None
    lastName = None
    location = None

    def __str__(self):
        return self.firstName + " " + self.lastName + "(" + self.id + ") " + self.location

factory = XMLInputFactory.newInstance();
reader = factory.createXMLStreamReader(ByteArrayInputStream(xml.getBytes()))
employees = []
employee = None
tagContent = None

while reader.hasNext():
    event = reader.next();

    if event == XMLStreamConstants.START_ELEMENT:
        if "employee" == reader.getLocalName():
            employee = Employee()
            employee.id = reader.getAttributeValue(0)
    elif event == XMLStreamConstants.CHARACTERS:
        tagContent = reader.getText()
    elif event == XMLStreamConstants.END_ELEMENT:
        if "employee" == reader.getLocalName():
            employees.append(employee)
        elif "firstName" == reader.getLocalName():
            employee.firstName = tagContent
        elif "lastName" == reader.getLocalName():
            employee.lastName = tagContent
        elif "location" == reader.getLocalName():
            employee.location = tagContent

for employee in employees:
    print employee
Caston answered 22/10, 2013 at 3:4 Comment(0)
S
1

You may use the sax parser of Java directly.

from javax.xml.parsers import SAXParserFactory
factory = SAXParserFactory.newInstance()
xmlReader = XMLReaderFactory.createXMLReader()

from org.xml.sax.helpers import DefaultHandler
handler = DefaultHandler() # or use your own handler
xmlReader.setContentHandler(handler)
xmlReader.parse(new InputSource(streamReader))
Spawn answered 16/10, 2013 at 18:10 Comment(1)
It actually is not an incremental parser, but an ordinary event-driven parser. What I really need is the possibility to feed chunks of XML, without inversion of control (providing a reader callback). E.g.: parser.feed("<doc>"); do_something_other(); parser.feed("</doc>")Essentiality

© 2022 - 2024 — McMap. All rights reserved.