I want to extract specific nodes from a large XML file. That works well, until a wild CDATA without any content appears.
The output:
ERROR: ''
javax.xml.transform.TransformerException: java.lang.IndexOutOfBoundsException
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:732)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
at xml_test.XML_Test.extractXML2(XML_Test.java:698)
at xml_test.XML_Test.main(XML_Test.java:811)
Caused by: java.lang.IndexOutOfBoundsException
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723)
... 3 more
---------
java.lang.IndexOutOfBoundsException
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171)
at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
at xml_test.XML_Test.extractXML2(XML_Test.java:698)
at xml_test.XML_Test.main(XML_Test.java:811)
The code:
InputStream stream = new FileInputStream("C:\\myFile.xml");
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader reader = factory.createXMLStreamReader(stream);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
String extractPath = "/root";
String path = "";
while(reader.hasNext()) {
reader.next();
if(reader.isStartElement()) {
path += "/" + reader.getLocalName();
if(path.equals(extractPath)) {
StringWriter writer = new StringWriter();
StAXSource src = new StAXSource(reader);
StreamResult res = new StreamResult(writer);
t.transform(src, res); // Exception thrown
System.out.println(writer.toString());
path = path.substring(0, path.lastIndexOf("/"));
}
}
else if(reader.isEndElement()) {
path = path.substring(0, path.lastIndexOf("/"));
}
}
The XML that raises the error:
<foo><![CDATA[]]></foo>
Can I make the Transformer
to just ignore that? Or what would another implementation look like? I'm not able to change the input XML!