IndexOutOfBoundsException when processing empty CDATA with Transformer
Asked Answered
L

2

7

I want to extract specific nodes from a large XML file. That works well, until a wild CDATA without any content appears.

The output:

ERROR:  ''
javax.xml.transform.TransformerException: java.lang.IndexOutOfBoundsException
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:732)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
    at xml_test.XML_Test.extractXML2(XML_Test.java:698)
    at xml_test.XML_Test.main(XML_Test.java:811)
Caused by: java.lang.IndexOutOfBoundsException
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723)
    ... 3 more
---------
java.lang.IndexOutOfBoundsException
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1143)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:261)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:171)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:120)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:674)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:723)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336)
    at xml_test.XML_Test.extractXML2(XML_Test.java:698)
    at xml_test.XML_Test.main(XML_Test.java:811)

The code:

InputStream stream = new FileInputStream("C:\\myFile.xml");
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader reader = factory.createXMLStreamReader(stream);

TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();

String extractPath = "/root";
String path = "";

while(reader.hasNext()) {
    reader.next();

    if(reader.isStartElement()) {
        path += "/" + reader.getLocalName();

        if(path.equals(extractPath)) {
            StringWriter writer = new StringWriter();
            StAXSource src = new StAXSource(reader);
            StreamResult res = new StreamResult(writer);
            t.transform(src, res); // Exception thrown

            System.out.println(writer.toString());

            path = path.substring(0, path.lastIndexOf("/"));
        }
    }
    else if(reader.isEndElement()) {
        path = path.substring(0, path.lastIndexOf("/"));
    }
}

The XML that raises the error:

<foo><![CDATA[]]></foo>

Can I make the Transformer to just ignore that? Or what would another implementation look like? I'm not able to change the input XML!

Lacey answered 19/1, 2015 at 15:34 Comment(4)
possible duplicate of How do you debug an xml object that causes a transform error when writing to string?Datestamp
I've seen this question and read its answers. They doesn't help me solving my problem, since I get another Exception and the link to the "helpful post" is dead. I don't know what's the cause and where to search for it.Lacey
I was able to reproduce you error, let me take a lookTen
@Lacey the link to the 'helpful post' is available on archive.org, you can view it here: Solve Transformation Null Pointer exceptionPlemmons
T
4

This is an issue on Xerces implementation, check this: https://issues.apache.org/jira/browse/XERCESJ-1033

It seems that empty CDATA are not supposed to exist, so the only advices that I can give it to you is:

  1. Change the XML parser implementation
  2. Remove empty CDATA from source files (replace "<![CDATA[]]>" with "")
    or put a whitespace in CDATA e.g. <![CDATA[ ]]>

I add some examples with another implementation.

Jaxb

In Jaxb you map your XML to POJO's in a simple manner.

For example, if you have the next xml file in c:\myFile.xml:

<root>
  <foo><![CDATA[]]></foo>
  <foo><![CDATA[some data here]]></foo>
</root>

You could have the next POJO's:

@XmlRootElement
public class Root {

  @XmlElement(name="foo")
  privateList<Foo> foo;

  public List<Foo> getFooList() {
    return foo;
  }

  public void setFooList(List<Foo> fooList) {
    this.foo = fooList;
  }

}

@XmlType(name = "foo")
public class Foo {

  @XmlValue
  private String content;

  @Override
  public String toString() {
    return content;
  }

}

And then parse from XML to Object with the next snippet:

    public static void main(String[] args) {
    try {

        File file = new File("C:\\myFile.xml");
        JAXBContext jaxbContext = JAXBContext.newInstance(Root.class);

        Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
        Root root = (Root) jaxbUnmarshaller.unmarshal(file);

        for (Foo foo : root.getFooList()) {
            System.out.println(String.format("Foo content: |%s|", foo));
        }

    } catch (JAXBException e) {
        e.printStackTrace();
    }

}

I tested this and raises no error.

Ten answered 23/1, 2015 at 10:57 Comment(3)
Another implementation is needed then. Do you have an suggestion? @Ron said in this comment, we could pipe the XML directly without using Transformer. How'd that work?Lacey
@Lacey it depends on what you are looking for. XMLBeans is very flexible, letting you use getter/setter, traverse the XML or even DOM or XPath. If you want to map XML to POJO's try Jaxb.Ten
Thank you very much so far. I'll try your suggestions. I got my original code to work, but in a way I don't want to leave it. You may be interested in that: #28152210Lacey
G
1

I encountered this error with two builds of the same application, one build exhibiting the error when handing empty <![CDATA[]]> and the other not.

The difference turned out to be that the broken build was using Xerces (embedded in jre), while the working build had an extra dependency added on the classpath, https://mvnrepository.com/artifact/org.codehaus.woodstox/woodstox-core-asl.

Relevant part of the stacktrace for the broken build would be

java.lang.Exception
        at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getTextCharacters(XMLStreamReaderImpl.java:1144)
        at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:242)
        at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:152)
        at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:101)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:679)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:728)
        at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:343)
        at com.sun.org.apache.xerces.internal.jaxp.validation.StAXValidatorHelper.validate(StAXValidatorHelper.java:107)
        at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:123)
        at javax.xml.validation.Validator.validate(Validator.java:124)

While for the working build

java.lang.Exception
    at com.ctc.wstx.sr.BasicStreamReader.getTextCharacters(BasicStreamReader.java:894)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.handleCharacters(StAXStream2SAX.java:242)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.bridge(StAXStream2SAX.java:152)
    at com.sun.org.apache.xalan.internal.xsltc.trax.StAXStream2SAX.parse(StAXStream2SAX.java:101)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transformIdentity(TransformerImpl.java:679)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:728)
    at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:343)
    at com.sun.org.apache.xerces.internal.jaxp.validation.StAXValidatorHelper.validate(StAXValidatorHelper.java:107)
    at com.sun.org.apache.xerces.internal.jaxp.validation.ValidatorImpl.validate(ValidatorImpl.java:123)
    at javax.xml.validation.Validator.validate(Validator.java:124)

This Q/A helped me to get "comfortable" with Woodstox What is the relation between fasterxml(jackson-dataformat-xml) and Woodstox?.

Glabella answered 15/12, 2018 at 18:6 Comment(1)
you are my hero, thx a lot for this advice. My project heavily depends on the Transformer, so I couldn't just moved to jaxb. Thanks for your advice, the only thing which I had to do it's just adding this woodstox-core-asl as dependency and it's worked, no other changes needed.Cellini

© 2022 - 2024 — McMap. All rights reserved.