What is the reason that CDATA even exists?
Asked Answered
H

9

16

I often see people asking XML/XSLT related questions here that root in the inability to grasp how CDATA works (like this one).

I wonder - why does it exist in the first place? It's not that XML could not do without it, everything you can put into a CDATA section can be expressed as "native" (XML-escaped).

I appreciate that CDATA potentially makes the resulting document a bit smaller, but let's face it - XML is verbose anyway. Small XML documents can be achieved more easily through compression, for example.

For me, CDATA breaks the clean separation of markup and data since you can have data that looks like markup to the unaided eye, which I find is a bad thing. (This may even be one of the things that encourages people to inadequately apply string processing or regex to XML.)

So: What good reason is there to use CDATA?

Hourihan answered 11/11, 2009 at 10:3 Comment(0)
C
13

CDATA sections are just for the convenience of human authors, not for programs. Their only use is to give humans the ability to easily include e.g. SVG example code in an XHTML page without needing to carefully replacing every < with &lt; and so on.

That is for me the intended use. Not to make the resulting document a few bytes smaller because you can use < instead of &lt;.

Also again taking the sample from above (SVG code in xhtml) it makes it easy for me to check the source code of the XHTML file and just copy-paste the SVG code out without again needing to back-replace &lt; with <.

Create answered 11/11, 2009 at 10:13 Comment(4)
I think it depends on whether you use text editors to manipulate XML or more appropriate tools like DOM APIs. I understand the convenience argument, though.Hourihan
Also - inserting SVG code into an X(HT)ML document into a CDATA section somehow defies the purpose, doesn't it? I mean - it would degrade perfectly well-formed XML to mere text data…Hourihan
@Hourihan exactly. That's what I mean by convenient for humans. If somebody manually edits some xml.Create
@svg in cdata defies:purpose: Umm no it doesn't why?. What I meant was a page which talks about SVG and thus has to show pieces of sample svg-code not to display the svg. Here cdata-sections make it easy to include the svg code without reformatting itCreate
B
6

PCDATA - parsed character data which means the data entered will be parsed by the parser.

CDATA - the data entered between CDATA elements will not be parsed by the parser.that is the text inside the CDATA section will be ignored by the parser. as a result a malicious user can sent destroying data to the application using these CDATA elements.

CDATA section starts with <![CDATA[ and ends with ]]>.

The only string that cannot occur in CDATA is ]]>.

The only reason why we use CDATA is: text like Javascript code contains lot of <, & characters. To avoid errors, script code can be defined as CDATA, because using < alone will generate an error, as parser interprets it as the start of new element. Similarly & can be interpreted as a start of the character entity by the parser.

Blockhead answered 13/11, 2009 at 11:16 Comment(1)
There is anther important thing that can not be put in CDATA. It is every character not available in the XML's charset. Outoside of CDATA any character can be escaped with &xxx; giving you access to the full unicode character even in an ASCII-encoded XML. But within a CDATA you are stuck with the XML character set. I think some characters like \r are also not valid inside a CDATA. CDATA is NOT a good escape method.Eyecup
T
4

I believe that CDATA was intended to allow raw binary data: as long as it doesn't contain "]]>" then anything goes in a CDATA section. This does set it apart from normal XML and should speed up parsing (and negate the necessity for full text encoding, thus giving a second performance boost). Actually it proved quite problematic what with people not escaping the closing sequence and several early parsers being variously broken, so most now just use a text encoding for binary data, making the CDATA section somewhat pointless, yes.

EDIT: note that this answer is in fact wrong, as Tomalak identifies in comments. I've not deleted it because I know there are other people out there who think that raw binary is acceptable in CDATA and this might clear up that little misunderstanding.

Triazine answered 11/11, 2009 at 10:36 Comment(7)
But CDATA means character data, I doubt that you can put in raw byte sequences that are otherwise illegal in XML.Hourihan
Oh yes you can! The binary data tends to break other things in the chain though! The main reason for still using CDATA is to preserve formatting of text, as in newlines and tabs and sequences of spaces, which get lost when parsing normal tabs.Carnation
Aslo its a mere 134,217,728 to one chance that ]]> will appear somewhere in your binary data!Carnation
The spec (w3.org/TR/REC-xml/#sec-cdata-sect) says CData can contain characters (w3.org/TR/REC-xml/#charsets). Sorry, but that does not look like binary was allowed to me. Maybe there is some odd XML parser that allows it, but it surely is not the way it was meant to be.Hourihan
I checked and yes, Tomalak is correct (or rather that's my reading of the Fifth Edition of the XML 1.0 spec). Either my original understanding predates XML 1.0 (entirely likely) or I was misinformed (equally likely).Triazine
@sinibar: I suggest you make that answer a community wiki (you can do so in edit mode). Some people down-vote "wrong" answers regardless whether you have pointed out the mistake or not. In wiki mode, this won't cause you any rep loss, at least.Hourihan
+1, because you taking an answer that is wrong and turing it into useful informationBot
J
4

I don't know how helpful this will be, but I'll throw this in too:

One of the issues is that there are a couple of distinct camps of XML developers, where some view XML as a representation of data, and some view it in a more document-centric way. (The beauty of XML is that it works well for both.)

Those who view XML as a representation of data--where the XML is often being produced and consumed by tools, and humans only get involved for troubleshooting--will see little value in a CDATA section, because it doesn't make a difference to their tools, whereas those who use XML for more document-centric purposes may find CDATA sections much more useful.

Juarez answered 11/11, 2009 at 15:22 Comment(0)
M
3

To me CDATA is just another word for lazy. When i started out with XML i used it, but nowadays i always convert data.

The best reason i can come up with is, convenience. Especially when you are using XML as some form of wrapper, to transport data from one system to another, in this case you may end up with the following.

Create XML wrapper
Convert data to XML
Put data inside wrapper
Send XML to receiver
Split XML to XML + Data in XML
Convert Data in XML to Data

Whereas using CDATA would result in not requiring the extra conversion steps.

Another usage could be to embed data without having to care about the different namespaces in the embedded data. But that is not really a great way to use it.

I've found another example of a good way to use CDATA, one that i should have thought of. It's the case when you need to embed code in an XML-file, the code is not supposed to be converted or it will not work and/or will not be easily readable.

Mediocre answered 11/11, 2009 at 10:14 Comment(2)
I disagree with the "not requiring the extra conversion steps." You still have to ensure the content doesn't contain a ]]>.Altimeter
That hardly qualifies as conversion in this context. But you are still right though.Mediocre
O
2

MXML demonstrates a great use of CDATA tags. One of the things I like about MXML is it is valid XML, meaning I can do useful things like generate flash widgets programmatically from a different XML file using a transform, and validate MXML against a schema.

CDATA tags are useful in MXML because they to define an ActionScript script block within an MXML file, allowing me to combine an ECMA type scripting language (with > and < and the like) and valid XML in a single file.

EDIT:

I suppose another option to combine MXML and ActionScript would be to combine them in the way you combine HTML and Javascript, and that is to wrap the script in an XML comment tag inside the script block, and the choice to use CDATA instead was made by the developers of the MXML compiler. I suppose the reasoning probably has more to do with editing, as the MXML editor validates your code against a schema to check syntax and provide context help, as well parsing your actionscript code for syntax and context help. Using CDATA in the editor allows it to do both and differentiate between XML comments and script blocks.

Obsequious answered 11/11, 2009 at 10:16 Comment(0)
E
2

When in doubt, check the spec:

2.7 CDATA Sections

[Definition: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup.

Euphemize answered 11/11, 2009 at 13:12 Comment(3)
@NickFitz: I'm aware of the basic facts. ;-) I was asking what the benefit of CDATA over XML-escaping would be.Hourihan
And the spec tells you: they are used for escaping blocks of text containing characters which would otherwise be recognised as markup. The corollary of this is that they can be used when for some reason it is impractical, impossible, or undesirable to escape markup characters using entities. Therefore the benefit is that CDATA sections provide an alternative to escaping. Devising actual use-cases is left as an exercise for the reader ;-)Euphemize
You are the reader, this is your exercise as posed by Tomalak. :-PLizzettelizzie
E
1

CDATA sections are really useful when you want to define a schema for some XML but part of it is out of your control and you can't ensure that it will meet the schema or won't break the XML.

I often have to work with legacy systems that have HTML outputs that are often not well formed XHTML, I can attach a schema that ensures that the XML is structered correctly but have a tag that just contains a CDATA section for housing the potentially bad HTML within CDATA.

It's not a common usage but it definitely has it's uses when you don't want other people's lax programming to break your system.

Ezzell answered 11/11, 2009 at 14:53 Comment(2)
But you could just use the HTML outputs as the node value and they would work equally well, only that they appear as XML-escaped.Hourihan
Yes but that incurs a performance cost of having to convert to escaped HTML and then back out again, probably minor in a lot of use cases but within a transport mechanism particularly one with high load it is potentially significant. Also as I highlight, when working with legacy systems it can be dangerous to assume that they can escape the characters let alone that they will consistently.Ezzell
J
0

Here's a concrete example of why/when you may want to use CDATA.

Get rid of the CDATAs and this simple SVG will fail to be parsed by browsers:

<?xml version="1.0" encoding="UTF-8"?>
<svg version="1.1"
    baseProfile="full"
    xmlns="http://www.w3.org/2000/svg"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:ev="http://www.w3.org/2001/xml-events"
    >

<title>CDATA</title>

<style type="text/css"><![CDATA[

/**
 * Imagine you mention this element <foo> in a comment… or use the & sign.
 * Then…
 *
 * If this weren't wrapped into CDATA (mind both the starting and closing
 * tags), then the browser would fail to parse the file correctly. For example:
 *
 * Firefox would fail with this:
 * > XML Parsing Error: mismatched tag. Expected: [foo's closing tag].
 *
 * Chrome and Safari would fail with this:
 * > This page contains the following errors:
 * > error on line 22 at column 9: Opening and ending tag mismatch: foo line 0 and style
 */

]]></style>


<text x="20" y="60" font-size="60">Hello</text>

<script><![CDATA[

// <text>
// see comment in the CSS, because it's the same situation here.

]]></script>
</svg>

This was with an SVG file, but you should take the same precautions with just any XML file.

Joyjoya answered 19/2, 2020 at 15:4 Comment(3)
You could XML-escape all those string values and it would be good. There is no need for CDATA here. It's a "nice to have" feature for human editors.Hourihan
Fair enough – edited the word "need" into "may want to use" for the sake of precision. Even then, I'll argue that being defensive, making it so that forgetting to escape an entity won't break a file, is the way to go.Joyjoya
The question was indeed written out of a DOM API's point of view, not out of a human editor's angle. I completely get why it's convenient.Hourihan

© 2022 - 2024 — McMap. All rights reserved.