Generating XML using SAX and Java [closed]
Asked Answered
A

6

20

Anyone know of a good tutorial (or have a good example) for writing XML using the SAX framework (or something similar) and Java? Searching has yielded very little in terms of useful results. I'm trying to export from an Android app and am looking to avoid as much memory overhead as possible.

Amaro answered 4/2, 2011 at 13:40 Comment(3)
SAX is a parser, it doesn't generate anything :)Lammastide
Unless you absolutely want to use SAX (which may occasionally be the case when doing processing pipelines), maybe you could ask in more generic terms -- that is, what are you trying to achieve, as opposed to "how do I use this tool"?Behlau
What I'm really looking for is to generate XML with as little memory overhead as possible. I've seen some examples using the SAX framework, but nothing concrete enough for a Java newb like myself to find useful.Amaro
Q
18

There's a very useful technique for generating XML directly from POJOs via the SAX framework (not a SAX parser, but the SAX framework). This technique could be used to generate an XML document.

Generating XML from an Arbitrary Data Structure
http://download.oracle.com/javaee/1.4/tutorial/doc/JAXPXSLT5.html

Essentially, you add methods to your POJO or write utility class for your POJOs that turn them into SAX event emitters (emitting events like a SAX parser normally would when parsing an XML document). Now your "SAX event generator" looks like the output side of a SAX parser and can be given any content handler that a SAX parser would take, such as one that pretyy prints XML. But it could also be feed to a DOM parser to generate a DOM tree or feed to an XSLT engine to generate HTML or do a true XSL translation without having to first generate an intermediate XML document from the POJOs.

For example, a Person class might have an emitXML() method that include these lines:

handler.startElement(nsu, PERSON_TAG, PERSON_TAG, NO_ATTRIBUTES);

handler.startElement(nsu, FIRSTNAME_TAG, FIRSTNAME_TAG, atts);
handler.characters(this.firstName.toCharArray(), 
        0,
        this.firstName.length());
handler.endElement(nsu, FIRSTNAME_TAG, FIRSTNAME_TAG);

... emit more instance variables

... emit child object like: homeAddress.emitXML(handler, ...);

handler.endElement(nsu, PERSON_TAG, PERSON_TAG);

Update:

A couple of other references:


A couple of responses to comments:

This is true, but the XMLStreamWriter interface described above is much more user-friendly. – Michael Kay 3 hours ago

Yes, but I guess I wasn't clear. I could easy traverse the hierarchy and use XMLStreamWriter to directly output an XML document to a stream. However, the articles show a powerful technique to traverse the hierarchy and generate SAX events, instead of outputting an XML document directly. Now I can plug-in different content handlers that do different things or generate different versions of the XML. We could also feed our object hierarchy to any tool that accepted a SAX parser, like an XSLT engine. Its really just taking advantage of the visitor pattern established by the SAX framework: we separate traversing the hierarchy from output the XML. The parts that output the XML, the content handlers, should certainly use an XMLStreamWriter if their purpose is to write an XML stream.

For example, on our program, we sent XML messages over network sockets between distributed components and we also used XSLT to generate our HTML pages. Previously, we traversed our hierarchy to generate a XML document (a string) and then either wrote that XML document to a network socket or fed that document to the XSLT engine (which essentially just parsed it again). After using this technique, we could essentially feed our object hierarchy (using this SAX adapter) directly to the XSLT engine without needing the intermediate XML string. It was also convenient to be able to use one content handler to generate a compact XML representation for the network stream and use a different one to generate a pretty-printed XML document for writing to a log file.

Besides, using SAX parser API to write XML is a misuse of the API, IMHO. – Puce 49 mins ago

Perhaps, but I think it depends on your needs. If OP's requirement is just to write out an a specific XML document, then this is definitely overkill. However, I thought it worth mentioning if the OP uses XML in other ways on his project that he didn't mention. There's no harm in pitching an alternative idea.

Calling it misuse may be a bit strong, but I agree you're entitled to your opinion. Its documented in an Oracle tutorial, so its not considered abuse by the Sun/Oracle engineers. It was highly successful on our project to help us meet our requirements with no significant downsides, so I'll be keeping this approach in my toolbox for when its useful in the future.

Quianaquibble answered 4/2, 2011 at 14:7 Comment(5)
This is true, but the XMLStreamWriter interface described above is much more user-friendly.Convexoconvex
Besides, using SAX parser API to write XML is a misuse of the API, IMHO.Tade
@Michael Kay - Thanks for your comment. I tried to clarify in an edit above.Quianaquibble
In addition to other comments I would suggest that if you want POJO->xml, use of JAXB would be the obvious way. It does use a SAX/Stax based writer under the hood, so performance can be pretty good (only caveat is that Stax impl 1.6 ships with has horrible writer performance; use Woodstox instead).Behlau
Not often, but sometimes you really, really want to create a stream of SAX events, and your solution shows how. This method isn't easy to find on the 'net, and I appreciate your work with the explanation and examples.Grisly
U
36

SAX parsing is for reading documents, not writing them.

You can write XML with the XMLStreamWriter:

OutputStream outputStream = new FileOutputStream(new File("doc.xml"));

XMLStreamWriter out = XMLOutputFactory.newInstance().createXMLStreamWriter(
                new OutputStreamWriter(outputStream, "utf-8"));

out.writeStartDocument();
out.writeStartElement("doc");

out.writeStartElement("title");
out.writeCharacters("Document Title");
out.writeEndElement();

out.writeEndElement();
out.writeEndDocument();

out.close();
Upkeep answered 4/2, 2011 at 13:43 Comment(2)
I like this answer, unfortunately that package is not available in Android. My fault for not mentioning that though. D:Amaro
Ok, you can of course also just print the XML manually. out.print("<doc><tag>value</tag></doc>"); etc. But you have to make sure to escape the values properly. This could be done with apache commons StringEscapeUtils.escapeXML(). Or some other method. It depends on the possible values, maybe you could just do it with regex.Upkeep
Q
18

There's a very useful technique for generating XML directly from POJOs via the SAX framework (not a SAX parser, but the SAX framework). This technique could be used to generate an XML document.

Generating XML from an Arbitrary Data Structure
http://download.oracle.com/javaee/1.4/tutorial/doc/JAXPXSLT5.html

Essentially, you add methods to your POJO or write utility class for your POJOs that turn them into SAX event emitters (emitting events like a SAX parser normally would when parsing an XML document). Now your "SAX event generator" looks like the output side of a SAX parser and can be given any content handler that a SAX parser would take, such as one that pretyy prints XML. But it could also be feed to a DOM parser to generate a DOM tree or feed to an XSLT engine to generate HTML or do a true XSL translation without having to first generate an intermediate XML document from the POJOs.

For example, a Person class might have an emitXML() method that include these lines:

handler.startElement(nsu, PERSON_TAG, PERSON_TAG, NO_ATTRIBUTES);

handler.startElement(nsu, FIRSTNAME_TAG, FIRSTNAME_TAG, atts);
handler.characters(this.firstName.toCharArray(), 
        0,
        this.firstName.length());
handler.endElement(nsu, FIRSTNAME_TAG, FIRSTNAME_TAG);

... emit more instance variables

... emit child object like: homeAddress.emitXML(handler, ...);

handler.endElement(nsu, PERSON_TAG, PERSON_TAG);

Update:

A couple of other references:


A couple of responses to comments:

This is true, but the XMLStreamWriter interface described above is much more user-friendly. – Michael Kay 3 hours ago

Yes, but I guess I wasn't clear. I could easy traverse the hierarchy and use XMLStreamWriter to directly output an XML document to a stream. However, the articles show a powerful technique to traverse the hierarchy and generate SAX events, instead of outputting an XML document directly. Now I can plug-in different content handlers that do different things or generate different versions of the XML. We could also feed our object hierarchy to any tool that accepted a SAX parser, like an XSLT engine. Its really just taking advantage of the visitor pattern established by the SAX framework: we separate traversing the hierarchy from output the XML. The parts that output the XML, the content handlers, should certainly use an XMLStreamWriter if their purpose is to write an XML stream.

For example, on our program, we sent XML messages over network sockets between distributed components and we also used XSLT to generate our HTML pages. Previously, we traversed our hierarchy to generate a XML document (a string) and then either wrote that XML document to a network socket or fed that document to the XSLT engine (which essentially just parsed it again). After using this technique, we could essentially feed our object hierarchy (using this SAX adapter) directly to the XSLT engine without needing the intermediate XML string. It was also convenient to be able to use one content handler to generate a compact XML representation for the network stream and use a different one to generate a pretty-printed XML document for writing to a log file.

Besides, using SAX parser API to write XML is a misuse of the API, IMHO. – Puce 49 mins ago

Perhaps, but I think it depends on your needs. If OP's requirement is just to write out an a specific XML document, then this is definitely overkill. However, I thought it worth mentioning if the OP uses XML in other ways on his project that he didn't mention. There's no harm in pitching an alternative idea.

Calling it misuse may be a bit strong, but I agree you're entitled to your opinion. Its documented in an Oracle tutorial, so its not considered abuse by the Sun/Oracle engineers. It was highly successful on our project to help us meet our requirements with no significant downsides, so I'll be keeping this approach in my toolbox for when its useful in the future.

Quianaquibble answered 4/2, 2011 at 14:7 Comment(5)
This is true, but the XMLStreamWriter interface described above is much more user-friendly.Convexoconvex
Besides, using SAX parser API to write XML is a misuse of the API, IMHO.Tade
@Michael Kay - Thanks for your comment. I tried to clarify in an edit above.Quianaquibble
In addition to other comments I would suggest that if you want POJO->xml, use of JAXB would be the obvious way. It does use a SAX/Stax based writer under the hood, so performance can be pretty good (only caveat is that Stax impl 1.6 ships with has horrible writer performance; use Woodstox instead).Behlau
Not often, but sometimes you really, really want to create a stream of SAX events, and your solution shows how. This method isn't easy to find on the 'net, and I appreciate your work with the explanation and examples.Grisly
E
5

Below answers "a good tutorial for writing XML using the SAX parser and Java" part of question

I am not sure if you have gone through this. But I really like Java's Really Big Index of Everything.

Go through this: http://download.oracle.com/javase/tutorial/jaxp/index.html

And eventually, this: http://download.oracle.com/javase/tutorial/jaxp/sax/index.html

Erethism answered 4/2, 2011 at 13:45 Comment(1)
Thanks Nishant. This is complete information! This is what I was looking for.Garlaand
U
5

Please refer to my personal blog post: XML Generation In Java - specifically, The SAX method. It references a few other articles concerning this, provides a concrete example, and compares SAX with the other popular APIs for generating XML from Java.

(Realized this is an older question, but felt it necessary to add this for anyone else that may have the same question.)

Ubangishari answered 6/12, 2011 at 1:10 Comment(0)
T
3

Also consider JAXB to write/ read XML.

Tade answered 4/2, 2011 at 17:55 Comment(0)
G
1

You can also bridge to trax with this:

public abstract class PipedSAXSource extends SAXSource {
  protected PipedSAXSource() {
    setXMLReader(new CallWriteDuringSax());
  }

  protected abstract void writeTo(ContentHandler sink)
      throws IOException, SAXException;

  private class CallWriteDuringSax extends XMLFilterImpl {
    @Override
    public void parse(InputSource ignored) throws IOException, SAXException {
      writeTo(getContentHandler());
    }

    @Override
    public void setFeature(String name, boolean value) {}
  }
}

Use like so:

  public static void main(String[] args) throws Exception {
    Source in = new PipedSAXSource() {
      @Override
      protected void writeTo(ContentHandler sink) throws SAXException {
        sink.startDocument();

        sink.startElement("", "root", "root", new AttributesImpl());
        sink.endElement("", "root", "root");

        sink.endDocument();
      }
    };

    Transformer identity = TransformerFactory.newInstance().newTransformer();
    identity.transform(in, new StreamResult(System.out));
  }
Gymnastic answered 2/8, 2013 at 19:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.