Using SAX (Java) to parse multiple XML messages from a single TCP-stream
Asked Answered
O

1

7

I'm in a position where I use Java to connect to a TCP port and am streamed XML documents one after another, each delimited with the <?xml start of document tag. An example which demonstrates the format:

<?xml version="1.0"?>
<person>
    <name>Fred Bloggs</name>
</person>
<?xml version="1.0"?>
<person>
    <name>Peter Jones</name>
</person>

I'm using the org.xml.sax.* api. The SAX parsing works perfectly for the first document but throws an exception when it comes across the start of the second document:

Exception in thread "main" org.xml.sax.SAXParseException: The processing instruction 
target matching "[xX][mM][lL]" is not allowed.

The following skeleton class demonstrates the setup I'm using:

import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;

import java.io.FileReader;

public class XMLTest extends DefaultHandler {

  public XMLTest() {
     super();
  }

  public static void main(String[] args) throws Exception {
    XMLReader xr = XMLReaderFactory.createXMLReader();

    XMLTest handler = new XMLTest();
    xr.setContentHandler(handler);
    xr.setErrorHandler(handler);

    xr.parse(new InputSource(new Socket("127.0.0.1", 4555).getInputStream()));
  }
}

I have no control over the format of the xml (it's a financial data feed), but I need to be able to parse it efficiently, and parse all the documents. I've spent the afternoon/evening trying different things but none have yielded results. Any help would be greatly appreciated.

Ointment answered 21/7, 2010 at 18:35 Comment(2)
You have to call parse for each separate document, which means you need to filter and break up the input stream on the '<?xml' chars.Metro
I had to do something like this and just replied (to myself) here wrapping everything in its own Reader for simpler useArkansas
V
7

You'd like to split the stream on every <?xml version="1.0"?> and parse them all separately. The BufferedReader may be helpful in this. Kickoff example:

reader = new BufferedReader(new InputStreamReader(input, "UTF-8"));
StringBuilder builder = null;
for (String line; (line = reader.readLine()) != null;) {
    if (line.startsWith("<?xml")) {
        if (builder != null) {
            xr.parse(new InputSource(builder.toString()));
        }
        builder = new StringBuilder();
    }
    builder.append(line);
}
Vietnamese answered 21/7, 2010 at 18:49 Comment(4)
When doing this when input is InputStream input = new Socket("127.0.0.1", 4500).getInputStream(); I get the following exception: Exception in thread "main" java.io.FileNotFoundException: /Users/admin/IdeaProjects/XMLTest/< (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.<init>(FileInputStream.java:106) at java.io.FileInputStream.<init>(FileInputStream.java:66) It seems xr.parse() doesn't like strings, even when wrapped as an InputSource.Ointment
Do you consider yourself capable to interpret stacktraces? I don't see how FileNotFoundException is related to this all. I'd say, your problem lies somewhere else, maybe in the step beyond parsing. The in the exception message given filename /Users/admin/IdeaProjects/XMLTest/< does indeed not look valid btw. Reread the stacktrace, backtrace the right location in the code which caused this based on the line numbers in the trace, nail down the root cause and fix it. If you stucks and this problem is indeed not related to this question, ask a new question (e.g. "How to save a XML file?").Vietnamese
Hey, I can read stacktraces - I only pasted the first few lines. The stacktrace pointer to my code is at XMLTest.main(XMLTest.java:42) and line 42 is: xr.parse(new InputSource(builder.toString())); (which is from your example above). I appreciate your assistance with this.Ointment
The solution is to wrap the StringBuilder in a StringReader, ie: xr.parse(new InputSource(new StringReader(builder.toString()))); Thanks for your assistance!Ointment

© 2022 - 2024 — McMap. All rights reserved.