Parse a list of XML fragments with no root element from a stream input
Asked Answered
S

2

9

Is it feasible in Java using the SAX api to parse a list of XML fragments with no root element from a stream input?

I tried parsing such an XML but got a

org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.

before even the endDocument event was fired.

I would like not to settle with obvious but clumsy solutions as "Pre-append a custom root element or Use buffered fragment parsing".

I am using the standard SAX API of Java 1.6. The SAX factory had setValidating(false) in case anyone wondered.

Somatoplasm answered 27/6, 2012 at 12:58 Comment(2)
Duplicate of #3232610.Streetwalker
You can refer Resolving "The markup in the document following the root element must be well-formed" ExceptionMarismarisa
F
13

First, and most important of all, the content you are parsing is not an XML document. From the XML Specification:

[Definition: There is exactly one element, called the root, or document element, no part of which appears in the content of any other element.]

Now, as to parsing this with SAX - in spite of what you said about clumsiness - I'd suggest the following approach:

Enumeration<InputStream> streams = Collections.enumeration(
    Arrays.asList(new InputStream[] {
        new ByteArrayInputStream("<root>".getBytes()),
        yourXmlLikeStream,
        new ByteArrayInputStream("</root>".getBytes()),
    }));

SequenceInputStream seqStream = new SequenceInputStream(streams);

// Now pass the `seqStream` into the SAX parser.

Using the SequenceInputStream is a convenient way of concatenating multiple input streams into a single stream. They will be read in the order they are passed to the constructor (or in this case - returned by the Enumeration).

Pass it to your SAX parser, and you are done.

Farci answered 27/6, 2012 at 13:18 Comment(3)
Agreed - the reason for the clumsy appending a root element is because you are dealing with clumsy data. Otherwise, as soon as you close any element which is the first element you opened, the SAX parser will believe it has finished, as it has. I also do it this way for an formatted-like-XML stream of dataAnnam
Although you provided an answer I have already thought of, the implementation is much more elegant than I could ever think! Thank you for your answer.Somatoplasm
Well, SequenceInputStream is one of those long forgotten utilities, that nobody seems to know about, despite being there since Java 1.0. Just wanted to remind it's still there. :)Farci
O
0

It sounds like you are working with XMPP. If so, there are libraries for parsing streams of XML fragments. There is a draft for an XML Fragment Interchange specification which was published by the W3C. This draft aimed to provide guidance on how XML fragments could be standardized for interchange. This specification did not gain wide adoption and is not actively maintained.

Ontology answered 2/5 at 22:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.