How to generate CDATA block using JAXB?
Asked Answered
D

10

41

I am using JAXB to serialize my data to XML. The class code is simple as given below. I want to produce XML that contains CDATA blocks for the value of some Args. For example, current code produces this XML:

<command>
   <args>
      <arg name="test_id">1234</arg>
      <arg name="source">&lt;html>EMAIL&lt;/html></arg>
   </args>
</command>

I want to wrap the "source" arg in CDATA such that it looks like below:

<command>
   <args>
      <arg name="test_id">1234</arg>
      <arg name="source"><[![CDATA[<html>EMAIL</html>]]></arg>
   </args>
</command>

How can I achieve this in the below code?

@XmlRootElement(name="command")
public class Command {

        @XmlElementWrapper(name="args")
        protected List<Arg>  arg;
    }
@XmlRootElement(name="arg")
public class Arg {

        @XmlAttribute
        public String name;
        @XmlValue
        public String value;

        public Arg() {};

        static Arg make(final String name, final String value) {
            Arg a = new Arg();
            a.name=name; a.value=value;
            return a; }
    }
Doorstep answered 28/6, 2010 at 21:45 Comment(1)
Can you find any solution to that problem? If yes please share, thanks.Shumaker
A
26

Note: I'm the EclipseLink JAXB (MOXy) lead and a member of the JAXB (JSR-222) expert group.

If you are using MOXy as your JAXB provider then you can leverage the @XmlCDATA extension:

package blog.cdata;

import javax.xml.bind.annotation.XmlRootElement;
import org.eclipse.persistence.oxm.annotations.XmlCDATA;

@XmlRootElement(name="c")
public class Customer {

   private String bio;

   @XmlCDATA
   public void setBio(String bio) {
      this.bio = bio;
   }

   public String getBio() {
      return bio;
   }

}

For More Information

Algerian answered 15/7, 2010 at 0:9 Comment(8)
I'm not sure why this response received a down vote. It is a direct response to the questions, with a link to a detailed description of how the solution can be applied. JAXB is a specification and compliant implementations such as MOXy contain extensions to handle such things as CDATA.Algerian
I can see the solution on your blog, I was hoping to find something that would not require use of any thirdparty Jar. But I figured out it is not supported by JAXB implementation that comes with Sun JDK.Doorstep
This link has been popping up more than often when I'm trying to solve this issue. I trust it works very well, but one thing that puzzles me is that I cannot understand how to plug the solution into my client. Every example uses main method to prove that marshalling works, but they are missing a part how to use it in real client. For example where should JAXBContext jc = JAXBContext.newInstance(classes, props); in wsdl2java generated client go, since this JAXBContext is invoked by jax-ws automatically, if I have understood correctly.Lippmann
Answer to my question lies here: blog.bdoughan.com/2011/04/…Lippmann
Blaise stop answering to every jaxb+cdata question. For some ppl this is no solution.Ankle
@locke - That is why Stack Overflow allows more than one answer per question :).Algerian
@blaiseDoughan - I agree, most times we are looking for a solution that will work with the JAXB RI without having to bring in and configure MOXy. I have yet to find a solution for this, and all the SO questions about it have this as an answer.Lawrence
@Quantas - MOXy is the default JAXB provider in Oracle WebLogic and is available be default as a JAXB provider in GlassFish so this solution is of direct use to those users. Also as you can tell none of the solutions are easy. Since JAXB (JSR-222) is a spec (I'm a member of the expert group) switching providers is an option for people to get the extensions they are looking for. I have posted over 2000 answers to JAXB questions and whenever possible I try to post a pure JAXB solution, where I don't I put the disclaimer at the top of my answer.Algerian
T
20

Use JAXB's Marshaller#marshal(ContentHandler) to marshal into a ContentHandler object. Simply override the characters method on the ContentHandler implementation you are using (e.g. JDOM's SAXHandler, Apache's XMLSerializer, etc):

public class CDataContentHandler extends (SAXHandler|XMLSerializer|Other...) {
    // see http://www.w3.org/TR/xml/#syntax
    private static final Pattern XML_CHARS = Pattern.compile("[<>&]");

    public void characters(char[] ch, int start, int length) throws SAXException {
        boolean useCData = XML_CHARS.matcher(new String(ch,start,length)).find();
        if (useCData) super.startCDATA();
        super.characters(ch, start, length);
        if (useCData) super.endCDATA();
    }
}

This is much better than using the XMLSerializer.setCDataElements(...) method because you don't have to hardcode any list of elements. It automatically outputs CDATA blocks only when one is required.

Tranquilize answered 25/7, 2011 at 21:51 Comment(3)
Clean and easy. I'm going to put it through some testing. Thanks!Nancienancy
Cann't I extend a DataWriter class and use this procedure? I am the default contentHandler, so I cannot extend it and use it to solve my issues.Globate
@Apoorvasahay Finally, I find one class in the class to extend in JDK 8. com.sun.xml.internal.txw2.output.XMLWriter. Refer to my anser for detail.Narrows
A
18

Solution Review:

  • The answer of fred is just a workaround which will fail while validating the content when the Marshaller is linked to a Schema because you modify only the string literal and do not create CDATA sections. So if you only rewrite the String from foo to <![CDATA[foo]]> the length of the string is recognized by Xerces with 15 instead of 3.
  • The MOXy solution is implementation specific and does not work only with the classes of the JDK.
  • The solution with the getSerializer references to the deprecated XMLSerializer class.
  • The solution LSSerializer is just a pain.

I modified the solution of a2ndrade by using a XMLStreamWriter implementation. This solution works very well.

XMLOutputFactory xof = XMLOutputFactory.newInstance();
XMLStreamWriter streamWriter = xof.createXMLStreamWriter( System.out );
CDataXMLStreamWriter cdataStreamWriter = new CDataXMLStreamWriter( streamWriter );
marshaller.marshal( jaxbElement, cdataStreamWriter );
cdataStreamWriter.flush();
cdataStreamWriter.close();

Thats the CDataXMLStreamWriter implementation. The delegate class simply delegates all method calls to the given XMLStreamWriter implementation.

import java.util.regex.Pattern;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;

/**
 * Implementation which is able to decide to use a CDATA section for a string.
 */
public class CDataXMLStreamWriter extends DelegatingXMLStreamWriter
{
   private static final Pattern XML_CHARS = Pattern.compile( "[&<>]" );

   public CDataXMLStreamWriter( XMLStreamWriter del )
   {
      super( del );
   }

   @Override
   public void writeCharacters( String text ) throws XMLStreamException
   {
      boolean useCData = XML_CHARS.matcher( text ).find();
      if( useCData )
      {
         super.writeCData( text );
      }
      else
      {
         super.writeCharacters( text );
      }
   }
}
Auger answered 5/5, 2012 at 9:59 Comment(1)
DelegatingXMLStreamWriter: github.com/apache/cxf/blob/master/core/src/main/java/org/apache/…Lynettalynette
U
11

Here is the code sample referenced by the site mentioned above:

import java.io.File;
import java.io.StringWriter;

import javax.xml.bind.JAXBContext;
import javax.xml.bind.Marshaller;
import javax.xml.bind.Unmarshaller;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;

public class JaxbCDATASample {

    public static void main(String[] args) throws Exception {
        // unmarshal a doc
        JAXBContext jc = JAXBContext.newInstance("...");
        Unmarshaller u = jc.createUnmarshaller();
        Object o = u.unmarshal(...);

        // create a JAXB marshaller
        Marshaller m = jc.createMarshaller();

        // get an Apache XMLSerializer configured to generate CDATA
        XMLSerializer serializer = getXMLSerializer();

        // marshal using the Apache XMLSerializer
        m.marshal(o, serializer.asContentHandler());
    }

    private static XMLSerializer getXMLSerializer() {
        // configure an OutputFormat to handle CDATA
        OutputFormat of = new OutputFormat();

        // specify which of your elements you want to be handled as CDATA.
        // The use of the '^' between the namespaceURI and the localname
        // seems to be an implementation detail of the xerces code.
        // When processing xml that doesn't use namespaces, simply omit the
        // namespace prefix as shown in the third CDataElement below.
        of.setCDataElements(
            new String[] { "ns1^foo",   // <ns1:foo>
                   "ns2^bar",   // <ns2:bar>
                   "^baz" });   // <baz>

        // set any other options you'd like
        of.setPreserveSpace(true);
        of.setIndenting(true);

        // create the serializer
        XMLSerializer serializer = new XMLSerializer(of);
        serializer.setOutputByteStream(System.out);

        return serializer;
    }
}
Unkind answered 17/7, 2010 at 20:11 Comment(0)
Q
10

For the same reasons as Michael Ernst I wasn't that happy with most of the answers here. I could not use his solution as my requirement was to put CDATA tags in a defined set of fields - as in raiglstorfer's OutputFormat solution.

My solution is to marshal to a DOM document, and then do a null XSL transform to do the output. Transformers allow you to set which elements are wrapped in CDATA tags.

Document document = ...
jaxbMarshaller.marshal(jaxbObject, document);

Transformer nullTransformer = TransformerFactory.newInstance().newTransformer();
nullTransformer.setOutputProperty(OutputKeys.INDENT, "yes");
nullTransformer.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "myElement {myNamespace}myOtherElement");
nullTransformer.transform(new DOMSource(document), new StreamResult(writer/stream));

Further info here: http://javacoalface.blogspot.co.uk/2012/09/outputting-cdata-sections-with-jaxb.html

Quennie answered 28/9, 2012 at 9:29 Comment(1)
This works really nicely. This is an easy solution, although the CDATA elements must be defined at marshal time.Frontier
B
6

The following simple method adds CDATA support in JAX-B which does not support CDATA natively :

  1. declare a custom simple type CDataString extending string to identify the fields that should be handled via CDATA
  2. Create a custom CDataAdapter that parses and print content in CDataString
  3. use JAXB bindings to link CDataString and you CDataAdapter. the CdataAdapter will add/remove to/from CdataStrings at Marshall/Unmarshall time
  4. Declare a custom character escape handler that does not escape character when printing CDATA strings and set this as the Marshaller CharacterEscapeEncoder

Et voila, any CDataString element will be encapsulated with at Marshall time. At unmarshall time, the will automatically be removed.

Blend answered 4/8, 2011 at 21:13 Comment(0)
N
5

Supplement of @a2ndrade's answer.

I find one class to extend in JDK 8. But noted that the class is in com.sun package. You can make one copy of the code in case this class may be removed in future JDK.

public class CDataContentHandler extends com.sun.xml.internal.txw2.output.XMLWriter {
  public CDataContentHandler(Writer writer, String encoding) throws IOException {
    super(writer, encoding);
  }

  // see http://www.w3.org/TR/xml/#syntax
  private static final Pattern XML_CHARS = Pattern.compile("[<>&]");

  public void characters(char[] ch, int start, int length) throws SAXException {
    boolean useCData = XML_CHARS.matcher(new String(ch, start, length)).find();
    if (useCData) {
      super.startCDATA();
    }
    super.characters(ch, start, length);
    if (useCData) {
      super.endCDATA();
    }
  }
}

How to use:

  JAXBContext jaxbContext = JAXBContext.newInstance(...class);
  Marshaller marshaller = jaxbContext.createMarshaller();
  StringWriter sw = new StringWriter();
  CDataContentHandler cdataHandler = new CDataContentHandler(sw,"utf-8");
  marshaller.marshal(gu, cdataHandler);
  System.out.println(sw.toString());

Result example:

<?xml version="1.0" encoding="utf-8"?>
<genericUser>
  <password><![CDATA[dskfj>><<]]></password>
  <username>UNKNOWN::UNKNOWN</username>
  <properties>
    <prop2>v2</prop2>
    <prop1><![CDATA[v1><]]></prop1>
  </properties>
  <timestamp/>
  <uuid>cb8cbc487ee542ec83e934e7702b9d26</uuid>
</genericUser>
Narrows answered 18/9, 2016 at 12:28 Comment(1)
Thanks for your answer @bluearrow. I followed the steps but got an error regarding com.sun.xml.internal.txw2.output.XMLWriter which I was able to resolve using https://mcmap.net/q/392523/-java-indentingxmlstreamwriter-alternative. Thanks!Cruller
N
2

As of Xerxes-J 2.9, XMLSerializer has been deprecated. The suggestion is to replace it with DOM Level 3 LSSerializer or JAXP's Transformation API for XML. Has anyone tried approach?

Noonberg answered 5/5, 2011 at 15:43 Comment(1)
I'm trying to do and I found a link : mirthcorp.com/community/fisheye/rdiff/Mirth/trunk/server/src/…Elm
G
1

Just a word of warning: according to documentation of the javax.xml.transform.Transformer.setOutputProperty(...) you should use the syntax of qualified names, when indicating an element from another namespace. According to JavaDoc (Java 1.6 rt.jar):

"(...) For example, if a URI and local name were obtained from an element defined with , then the qualified name would be "{http://xyz.foo.com/yada/baz.html}foo. Note that no prefix is used."

Well this doesn't work - the implementing class from Java 1.6 rt.jar, meaning com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl interprets elements belonging to a different namespace only then correctly, when they are declared as "http://xyz.foo.com/yada/baz.html:foo", because in the implementation someone is parsing it looking for the last colon. So instead of invoking:

transformer.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "{http://xyz.foo.com/yada/baz.html}foo")

which should work according to JavaDoc, but ends up being parsed as "http" and "//xyz.foo.com/yada/baz.html", you must invoke

transformer.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "http://xyz.foo.com/yada/baz.html:foo")

At least in Java 1.6.

Guanine answered 12/5, 2016 at 10:14 Comment(0)
P
0

The following code will prevent from encoding CDATA elements:

Marshaller marshaller = context.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-8");
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);

StringWriter stringWriter = new StringWriter();
PrintWriter printWriter = new PrintWriter(stringWriter);
DataWriter dataWriter = new DataWriter(printWriter, "UTF-8", new CharacterEscapeHandler() {
    @Override
    public void escape(char[] buf, int start, int len, boolean b, Writer out) throws IOException {
        out.write(buf, start, len);
    }
});

marshaller.marshal(data, dataWriter);

System.out.println(stringWriter.toString());

It will also keep UTF-8 as your encoding.

Predominate answered 22/8, 2014 at 11:35 Comment(1)
And then, when some other non-CDATA field comes, with <> in it, then it screws up. Guys, this is really bad approach of doing this. I saw it couple of times here, but I would not really recommend it. Escape method is there, because you escape something, not because you DON'T want to escape at all.Hilburn

© 2022 - 2024 — McMap. All rights reserved.