SAX: How to get the content of an element
Asked Answered
C

3

6

I have some trouble understanding parsing XML structures with SAX. Let's say there is the following XML:

<root>
  <element1>Value1</element1>
  <element2>Value2</element2>
</root>

and a String variable myString.

Just going through with the methods startElement, endElement() and characters() is easy. But I don't understand how I can achieve the following:

If the current element equals element1 store its value value1 in myString. As far as I understand there is nothing like:

if (qName.equals("element1")) myString = qName.getValue();

Guess I'm just thinking too complicated :-)

Robert

Circumscription answered 7/11, 2010 at 21:45 Comment(0)
C
6

With SAX you need to maintain your own stack. You can do something like this for very basic processing:

void startElement(...) {
    if (name.equals("element1")) {
        inElement1 = true;
        element1Content = new StringBuffer();
    }
}

void characters(...) {
    if (inElement1) {
        element1Content.append(characterData);
    }
}

void endElement(...) {
    if (name.equals("element2")) {
        inElement1 = false;
        processElement1Content(element1Content.toString());
    }
}

If you want code as in your example then you need to use the DOM model rather than SAX. DOM is easier to code up but is generally slower and more memory expensive than SAX.

I recommend using a third-party library rather than the built-in Java XML libraries for DOM manipulation. Dom4J seems pretty good but there are probably other libraries out there too.

Cytolysin answered 7/11, 2010 at 21:54 Comment(2)
Thanks Cameron, that's what I have expected :-) As my application will run on an Android smartphone I think its better using the built-in SAX parser rather than switching to DOM.Circumscription
Perhaps use the preferred StringBuilderRough
S
9

This solution works for a single element with text content. When element1 has more sub-elements some more work is needed. Brian's remark is a very important one. When you have multiple elements or want a more generic solution this might help you. I tested it with a 300+MB xml file and it's still very fast:

final StringBuilder builder=new StringBuilder();
XMLReader saxXmlReader = XMLReaderFactory.createXMLReader();

DefaultHandler handler = new DefaultHandler() {
    boolean isParsing = false;

    public void startElement(String uri, String localName, String qName, Attributes attributes) {
        if ("element1".equals(localName)) {
            isParsing = true;
        }
        if (isParsing) {
            builder.append("<" + qName + ">");
        }
    }

    @Override
    public void characters(char[] chars, int i, int i1) throws SAXException {
        if (isParsing) {
            builder.append(new String(chars, i, i1));
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        if (isParsing) {
            builder.append("</" + qName + ">");
        }
        if ("element1".equals(localName)) {
            isParsing = false;
        }
    }
};

saxXmlReader.setContentHandler(handler);
saxXmlReader.setErrorHandler(handler);

saxXmlReader.parse(new InputSource(new FileInputStream(input)));
Sloppy answered 19/9, 2012 at 8:15 Comment(0)
R
6

You should record the contents via characters(), append to a StringBuilder for each invocation and only store the concatenated value upon the endElement() call.

Why ? Because characters() can be called multiple times for the element content - each call referencing a successive subsequence of that text element.

Rough answered 7/11, 2010 at 21:51 Comment(0)
C
6

With SAX you need to maintain your own stack. You can do something like this for very basic processing:

void startElement(...) {
    if (name.equals("element1")) {
        inElement1 = true;
        element1Content = new StringBuffer();
    }
}

void characters(...) {
    if (inElement1) {
        element1Content.append(characterData);
    }
}

void endElement(...) {
    if (name.equals("element2")) {
        inElement1 = false;
        processElement1Content(element1Content.toString());
    }
}

If you want code as in your example then you need to use the DOM model rather than SAX. DOM is easier to code up but is generally slower and more memory expensive than SAX.

I recommend using a third-party library rather than the built-in Java XML libraries for DOM manipulation. Dom4J seems pretty good but there are probably other libraries out there too.

Cytolysin answered 7/11, 2010 at 21:54 Comment(2)
Thanks Cameron, that's what I have expected :-) As my application will run on an Android smartphone I think its better using the built-in SAX parser rather than switching to DOM.Circumscription
Perhaps use the preferred StringBuilderRough

© 2022 - 2024 — McMap. All rights reserved.