innerHTML converts CDATA to comments

Asked 15/8, 2011 at 13:40 Answered 4/3, 2020 at 10:32

I'm trying to insert some HTML into a page using javascript, and the HTML I'm inserting contains CDATA blocks.

I'm finding, in Firefox and Chrome, that the CDATA is getting converted to a comment.

The HTML is not under my control, so it's difficult for me to avoid using CDATA.

The following test case, when there is a div on the page with id "test":

document.getElementById('test').innerHTML = '<![CDATA[foo]]> bar'

causes the following HTML to be appeded to the 'test' div:

<!--[CDATA[foo]]--> bar

Is there any way I can insert, verbatim, HTML containing CDATA into a document using javascript?

Anachronous answered 15/8, 2011 at 13:40 Comment(0)

document.createCDATASection should do it, but the real answer to your question is that although HTML 5 does have CDATA sections cross-browser support for them is pretty spotty.

EDIT

The CDATA sections just aren't in the HTML 4 definition, so most browsers won't recognize them.

But it doesn't require a full DOM parser. Here's a simple lexical solution that will fix the problem.

function htmlWithCDATASectionsToHtmlWithout(html) {
    var ATTRS = "(?:[^>\"\']|\"[^\"]*\"|\'[^\']*\')*",
        // names of tags with RCDATA or CDATA content.
        SCRIPT = "[sS][cC][rR][iI][pP][tT]",
        STYLE = "[sS][tT][yY][lL][eE]",
        TEXTAREA = "[tT][eE][xX][tT][aA][rR][eE][aA]",
        TITLE = "[tT][iI][tT][lL][eE]",
        XMP = "[xX][mM][pP]",
        SPECIAL_TAG_NAME = [SCRIPT, STYLE, TEXTAREA, TITLE, XMP].join("|"),
        ANY = "[\\s\\S]*?",
        AMP = /&/g,
        LT = /</g,
        GT = />/g;
    return html.replace(new RegExp(
        // Entities and text
        "[^<]+" +
        // Comment
        "|<!--"+ANY+"-->" +
        // Regular tag
        "|<\/?(?!"+SPECIAL_TAG_NAME+")[a-zA-Z]"+ATTRS+">" +
        // Special tags
        "|<\/?"+SCRIPT  +"\\b"+ATTRS+">"+ANY+"<\/"+SCRIPT  +"\\s*>" +
        "|<\/?"+STYLE   +"\\b"+ATTRS+">"+ANY+"<\/"+STYLE   +"\\s*>" +
        "|<\/?"+TEXTAREA+"\\b"+ATTRS+">"+ANY+"<\/"+TEXTAREA+"\\s*>" +
        "|<\/?"+TITLE   +"\\b"+ATTRS+">"+ANY+"<\/"+TITLE   +"\\s*>" +
        "|<\/?"+XMP     +"\\b"+ATTRS+">"+ANY+"<\/"+XMP     +"\\s*>" +
        // CDATA section.  Content in capturing group 1.
        "|<!\\[CDATA\\[("+ANY+")\\]\\]>" +
        // A loose less-than
        "|<", "g"),

        function (token, cdataContent) {
          return "string" === typeof cdataContent
              ? cdataContent.replace(AMP, "&amp;").replace(LT, "&lt;")
                .replace(GT, "&gt;")
              : token === "<"
              ? "&lt;"  // Normalize loose less-thans.
              : token;
        });
}

Given

<b>foo</b><![CDATA[<i>bar</i>]]>

it produces

<b>foo</b>&lt;i&gt;bar&lt;/i&gt;

and given something that looks like a CDATA section inside a script or other special tag or comment, it correctly does not muck with it:

<script>/*<![CDATA[*/foo=bar<baz&amp;//]]></script><![CDATA[fish: <><]]>

becomes

<script>/*<![CDATA[*/foo=bar<baz&amp;//]]></script>fish: &lt;&gt;&lt;

Array answered 15/8, 2011 at 17:6 Comment(2)

I have a block of HTML as a string, some of which contains CDATA blocks -- I can't go through creating DOM nodes for bits of it without parsing the string itself to see what nodes I should create, which would require including a DOM parser in my script, which seems a bit irrelevant, when all I want to do is insert HTML into a webpage. – Anachronous 16/8, 2011 at 15:3

@Rich, described a way to do what you need in JavaScript only. – Array 16/8, 2011 at 15:55

You could try to use innerText instead of innerHTML.

Ossa answered 15/8, 2011 at 13:54 Comment(1)

No good -- the data I'm inserting contains HTML markup, not just text – Anachronous 16/8, 2011 at 15:2

I would just strip the CDATA tags using a regular expression like so:

document.getElementById('test').innerHTML = '<![CDATA[foo]]> bar'.replace(/<!\[CDATA\[(.*)\]\]>/g, "$1")

Which results in 'test' having:

foo bar

That way the content of the CDATA sections is preserved without one having to worry about any of it becoming commented out. Unfortunately, this may break whatever required your documents to use CDATA sections to begin with.

Randeerandel answered 15/8, 2011 at 13:57 Comment(1)

Yes, this is no good, as the reason this text is in CDATA blocks in the first place is that it needs HTML escaping. I suppose I could convert CDATA to html-escaped text using a regex similar to the one you've posted. – Anachronous 16/8, 2011 at 15:4

convert <, > and & signs like this:

document.getElementById('test').innerHTML = '&lt;![CDATA[foo]]&gt bar'

Berton answered 15/8, 2011 at 13:57 Comment(1)

But that would insert a literal "<!CDATA" text into my document, when I don't want that. – Anachronous 16/8, 2011 at 15:6

That is because CDATA converts < and > (< and >) to their html entities. Try to convert the entities back to < and >.

You can read more about it here.

Saxhorn answered 15/8, 2011 at 13:58 Comment(1)

This isn't an escaping issue -- the content is not being added to the document at all. The CDATA is being converted to an HTML comment when I pass it to innerHTML. – Anachronous 16/8, 2011 at 15:5

If you make your page XHTML rather than HTML then the auto-comment "feature" of the CDATA might not happen. You do need to jump through the hoops that XHTML requires, such as a DOCTYPE, and whatever else.

Seems a bit arbitrary, any application that depends on CDATA is broken IMHO, but hopefully you get it working.

Mateo answered 12/10, 2015 at 3:6 Comment(0)

I still encountered this problem in 2020 :-(
The slight difference with OP was: I needed to inject XML (not html) into a div.
Applying @Mike Samuel 's answer unfortunately transformed the initial <?xml ... to <?xml ...
I just had to add following clause in the regex: "|<\\?[xX][mM][lL]"+ANY+"\\?>".

Full completed function for xml:

function xmlWithCDATASectionsToXmlWithout(xml) {
    var ATTRS = "(?:[^>\"\']|\"[^\"]*\"|\'[^\']*\')*",
        // names of tags with RCDATA or CDATA content.
        SCRIPT = "[sS][cC][rR][iI][pP][tT]",
        STYLE = "[sS][tT][yY][lL][eE]",
        TEXTAREA = "[tT][eE][xX][tT][aA][rR][eE][aA]",
        TITLE = "[tT][iI][tT][lL][eE]",
        XMP = "[xX][mM][pP]",
        SPECIAL_TAG_NAME = [SCRIPT, STYLE, TEXTAREA, TITLE, XMP].join("|"),
        ANY = "[\\s\\S]*?",
        AMP = /&/g,
        LT = /</g,
        GT = />/g;
    return xml.replace(new RegExp(
            // Entities and text
            "[^<]+" +
            // initial XML TAG
            "|<\\?[xX][mM][lL]"+ANY+"\\?>" +
            // Comment
            "|<!--"+ANY+"-->" +
            // Regular tag
            "|<\/?(?!"+SPECIAL_TAG_NAME+")[a-zA-Z]"+ATTRS+">" +
            // Special tags
            "|<\/?"+SCRIPT  +"\\b"+ATTRS+">"+ANY+"<\/"+SCRIPT  +"\\s*>" +
            "|<\/?"+STYLE   +"\\b"+ATTRS+">"+ANY+"<\/"+STYLE   +"\\s*>" +
            "|<\/?"+TEXTAREA+"\\b"+ATTRS+">"+ANY+"<\/"+TEXTAREA+"\\s*>" +
            "|<\/?"+TITLE   +"\\b"+ATTRS+">"+ANY+"<\/"+TITLE   +"\\s*>" +
            "|<\/?"+XMP     +"\\b"+ATTRS+">"+ANY+"<\/"+XMP     +"\\s*>" +
            // CDATA section.  Content in capturing group 1.
            "|<!\\[CDATA\\[("+ANY+")\\]\\]>" +
            // A loose less-than
            "|<", "g"
        ),
        function (token, cdataContent) {
            return "string" === typeof cdataContent
                    ? cdataContent.replace(AMP, "&amp;").replace(LT, "&lt;")
                        .replace(GT, "&gt;")
                    : token === "<"
                    ? "&lt;"  // Normalize loose less-thans.
                    : token;
        }
    );
}

Salience answered 4/3, 2020 at 10:32 Comment(0)

Recommended topics

Hot tags