What is CDATA in HTML? [duplicate]
Asked Answered
K

6

192

What is the use of CDATA inside JavaScript tags and HTML?

<script type="text/javascript"> 
// <![CDATA[

// ]]>
</script> 
Kawasaki answered 17/8, 2011 at 11:43 Comment(0)
T
141

All text in an XML document will be parsed by the parser.

But text inside a CDATA section will be ignored by the parser.

CDATA - (Unparsed) Character Data

The term CDATA is used about text data that should not be parsed by the XML parser.

Characters like "<" and "&" are illegal in XML elements.

"<" will generate an error because the parser interprets it as the start of a new element.

"&" will generate an error because the parser interprets it as the start of an character entity.

Some text, like JavaScript code, contains a lot of "<" or "&" characters. To avoid errors script code can be defined as CDATA.

Everything inside a CDATA section is ignored by the parser.

A CDATA section starts with "<![CDATA[" and ends with "]]>"

Use of CDATA in program output

CDATA sections in XHTML documents are liable to be parsed differently by web browsers if they render the document as HTML, since HTML parsers do not recognise the CDATA start and end markers, nor do they recognise HTML entity references such as &lt; within <script> tags. This can cause rendering problems in web browsers and can lead to cross-site scripting vulnerabilities if used to display data from untrusted sources, since the two kinds of parsers will disagree on where the CDATA section ends.

A brief SGML tutorial.

Also, see the Wikipedia entry on CDATA.

Tijerina answered 17/8, 2011 at 11:48 Comment(2)
I think I have a better question then. In broad strokes, what benefits are associated with using the CDATA tag?Savona
@Savona you can check this #67337Neurotic
F
100

CDATA has no meaning at all in HTML.

CDATA is an XML construct which sets a tag's contents that is normally #PCDATA - parsed character data, to be instead taken as #CDATA, that is, non-parsed character data. It is only relevant and valid in XHTML.

It is used in script tags to avoid parsing < and &. In HTML, this is not needed, because in HTML, script is already #CDATA.

Farver answered 17/8, 2011 at 11:44 Comment(4)
so, y does people use it inside Javascript tags? where does it has any meaning and what for, thanksKawasaki
@Kawasaki Probably because these people type XHTML documents instead of SGML/HTML, and/or they want to help less standards compliant browsers to correctly load their pages regardless.Isle
Even though it's almost 6 years old, this is still the best explanation of CDATA I've seen.Nikolia
It does has meaning in HTML, depends on whether you encounter the issuePneumatometer
P
21

CDATA is Obsolete.

Note that CDATA sections should not be used within HTML; they only work in XML.

So do not use it in HTML 5.

https://developer.mozilla.org/en-US/docs/Web/API/CDATASection#Specifications

Screenshot from MDN

Paratuberculosis answered 24/2, 2016 at 22:2 Comment(7)
I am confused about what is changing. 1) Character Data still exists in DOM4? w3.org/TR/dom/#interface-characterdata 2) Yet the the CDATASection is being removed? w3.org/TR/dom/#dom-core What will be the alternative? Mandatory encoding or all < and & and placed in some other tag? How about supporting old documents? Are the browsers suddenly going to drop CDATA support? So we can't process documents created by others over which we have no control? Or just resort to manual string fiddling?Bilabial
Just escape the special characters.Adultery
For creation of XML, I understand, simply escape characters. However, my concern is how to process CDATA sections (e.g. from feeds we can't control and may be slow to update their format), after the browsers drop CDATASection from the DOM? When will they drop? FF 49 is still showing me CDATASection in the DOM. It isn't clear to me how to handle in this case during the transitional time after it has been obsoleted and removed from browser. Will just be seen as a text node? An error (bad tag)? Just trying to avoid ugliness of manually finding markers in text to pull out the data inside.Bilabial
CDATA as such is not deprecated. XHTML is based on XML, so it must support CDATA. (In HTML, the CDATA markup has no meaning; it will be just parsed as a bogus comment.) It is the CDATASection interface that is deprecated; if a page is parsed as XHTML, its contents will appear in the DOM as a normal text node.Cleasta
Sorry XHTML is out! But if you want a HTML/XML you can use XHTML5. FYI: en.wikipedia.org/wiki/HTML5#XHTML5_(XML-serialized_HTML5)Adultery
Ironically the mozilla link no longer says anything about it being deprecated and shows that all current browsers support it.Agonize
@Agonize you right, the word they use is: Obsolete !Adultery
R
19

From http://en.wikipedia.org/wiki/CDATA:

Since it is useful to be able to use less-than signs (<) and ampersands (&) in web page scripts, and to a lesser extent styles, without having to remember to escape them, it is common to use CDATA markers around the text of inline and elements in XHTML documents. But so that the document can also be parsed by HTML parsers, which do not recognise the CDATA markers, the CDATA markers are usually commented-out, as in this JavaScript example:

<script type="text/javascript">
//<![CDATA[
document.write("<");
//]]>
</script>
Rockribbed answered 17/8, 2011 at 11:51 Comment(1)
Man... I used to see this all the time when I started learning JavaScript... really takes me back.Kreit
D
13

A way to write a common subset of HTML and XHTML

In the hope of greater portability.

In HTML, <script> is magic escapes everything until </script> appears.

So you can write:

<script>x = '<br/>';

and <br/> won't be considered a tag.

This is why strings such as:

x = '</scripts>'

must be escaped like:

x = '</scri' + 'pts>'

See: Why split the <script> tag when writing it with document.write()?

But XML (and thus XHTML, which is a "subset" of XML, unlike HTML), doesn't have that magic: <br/> would be seen as a tag.

<![CDATA[ is the XHTML way to say:

don't parse any tags until the next ]]>, consider it all a string

The // is added to make the CDATA work well in HTML as well.

In HTML <![CDATA[ is not magic, so it would be run by JavaScript. So // is used to comment it out.

The XHTML also sees the //, but will observe it as an empty comment line which is not a problem:

//

That said:

  • compliant browsers should recognize if the document is HTML or XHTML from the initial doctype <!DOCTYPE html> vs <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
  • compliant websites could rely on compliant browsers, and coordinate doctype with a single valid script syntax

But that violates the golden rule of the Internet:

don't trust third parties, or your product will break

Delorenzo answered 18/9, 2016 at 16:8 Comment(1)
"In HTML, <script> is magic escapes everything until </script> appears." actually per spec it would be until "</script(\w)+" appearsAlton
C
6

CDATA is a sequence of characters from the document character set and may include character entities. User agents should interpret attribute values as follows:

  • Replace character entities with characters,

  • Ignore line feeds,

  • Replace each carriage return or tab with a single space.

Chanukah answered 17/8, 2011 at 11:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.