How to parse XML for <![CDATA[]]>
Asked Answered
P

5

10

How to parse a XML having data included in <![CDATA[---]... how can we parse the xml and get the data included in CDATA ???

Parceling answered 13/12, 2011 at 12:20 Comment(1)
Do you parse the File "by hand" or use you any XMLReader Class (and which)?Firstly
F
9
public static void main(String[] args) throws Exception {
  File file = new File("data.xml");
  DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
 //if you are using this code for blackberry xml parsing
  builder.setCoalescing(true);
  Document doc = builder.parse(file);

  NodeList nodes = doc.getElementsByTagName("topic");
  for (int i = 0; i < nodes.getLength(); i++) {
    Element element = (Element) nodes.item(i);
    NodeList title = element.getElementsByTagName("title");
    Element line = (Element) title.item(0);
    System.out.println("Title: " + getCharacterDataFromElement(line));
  }
}
public static String getCharacterDataFromElement(Element e) {
  Node child = e.getFirstChild();
  if (child instanceof CharacterData) {
    CharacterData cd = (CharacterData) child;
    return cd.getData();
  }
  return "";
}

( http://www.java2s.com/Code/Java/XML/GetcharacterdataCDATAfromxmldocument.htm )

Firstly answered 13/12, 2011 at 12:26 Comment(3)
I would rather do something like : if (child != null && (child instanceof CharacterData)) { return ((CharacterData) child).getData(); } else { return e.getNodeValue(); } In order to handle seamlessly the presence/absence of CDATA block.Halfassed
Can you please provide some text to describe what you are doing and why you would use the DocumentBuilderFactory?Ricker
In current Java DOM implementation you can access CDATA simply as text data using e.getTextContent(). See example without type check, cast, e.getData().Pouch
P
4

Since all previous answers are using a DOM based approach. This is how to parse CDATA with a stream based approach using STAX.

Use the following pattern:

  switch (EventType) {
        case XMLStreamConstants.CHARACTERS:
        case XMLStreamConstants.CDATA:
            System.out.println(r.getText());
            break;
        default:
            break;
        }

Complete sample:

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

public void readCDATAFromXMLUsingStax() {
    String yourSampleFile = "/path/toYour/sample/file.xml";
    XMLStreamReader r = null;
    try (InputStream in =
            new BufferedInputStream(new FileInputStream(yourSampleFile));) {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        r = factory.createXMLStreamReader(in);
        while (r.hasNext()) {
            switch (r.getEventType()) {
            case XMLStreamConstants.CHARACTERS:
            case XMLStreamConstants.CDATA:
                System.out.println(r.getText());
                break;
            default:
                break;
            }
            r.next();
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    } finally {
        if (r != null) {
            try {
                r.close();
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    }
}

With /path/toYour/sample/file.xml

 <data>
    <![CDATA[ Sat Nov 19 18:50:15 2016 (1672822)]]>
    <![CDATA[Sat, 19 Nov 2016 18:50:14 -0800 (PST)]]>
 </data>

Gives:

 Sat Nov 19 18:50:15 2016 (1672822)                             
 Sat, 19 Nov 2016 18:50:14 -0800 (PST)       
Pouch answered 24/3, 2017 at 12:52 Comment(0)
R
2

CDATA just says that the included data should not be escaped. So, just take the tag text. XML parser should return the clear data without CDATA.

Raye answered 13/12, 2011 at 12:23 Comment(1)
getting the text data: e.getTextContent();Pastille
O
0

here r.get().getResponseBody() is the response body

Document doc = getDomElement(r.get().getResponseBody());            
    NodeList nodes = doc.getElementsByTagName("Title");
    for (int i = 0; i < nodes.getLength(); i++) {
    Element element = (Element) nodes.item(i);
    NodeList title = element.getElementsByTagName("Child tag where cdata present");
    Element line = (Element) title.item(0);
    System.out.println("Title: "+ getCharacterDataFromElement(line));


    public static Document getDomElement(String xml) {
        Document doc = null;
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setCoalescing(true);
        dbf.setNamespaceAware(true);
        try {
            DocumentBuilder db = dbf.newDocumentBuilder();
            InputSource is = new InputSource();
            is.setCharacterStream(new StringReader(xml));
            doc = db.parse(is);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return doc;
    }

    public static String getCharacterDataFromElement(Element e) {
        Node child = e.getFirstChild();
        if (child instanceof CharacterData) {
            CharacterData cd = (CharacterData) child;
            return cd.getData();
        }
        return "";
    }
Optional answered 30/10, 2014 at 7:30 Comment(0)
F
0

Below is the sample XML file and the code to retrieve the XML embedded in the the CDATA within main xml.

<envelope>
 <Header>
  <id>123</id>
  <name>abc</name>
 </Header>
 <payload>
  <![CDATA[<?xml> <Document><validXML></validXML></Document>]]>
</payload>
</envelope>

Xpath to get the CDATA XML given in above example would be

/envelope/payload/text()

So, once you have the root Document of above xml, with the given Path you can fetch the xml embedded in the CDATA.

Below is the utility method for the same.

public String getSubDocument(Document rootDocument, String xPathString) throws Exception {
XPath xPath = XPathFactory.newInstance().newXPath();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document rootDoc = builder.newDocument();
String xmlString = (String)xPath.compile(xPathString).evaluate(rootDocument, XPathConstants.String);
return xmlString;
}

}

Flammable answered 10/9, 2021 at 9:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.