How to parse an XML DOM inside a CDATA element in XSLT?
Asked Answered
B

4

5

say I have an XML file like:

<library>
 <books>
  <![CDATA[<genre><name>Sci-fi</name><count>2</count></genre>]]>
  <book>
   <name>
    Some Book
   </name>
   <author>
    Some author
   </author>
  <book>
  <book>
   <name>
    Another Book
   </name>
   <author>
    Another author
   </author>
  <book>
 <books>
</library>

I want to read the CDATA element 'name' in an xslt transformer and place its value somewhere in the vaue of a tag. How do I do this? AFAIK, we cannot use xpath on the contents of CDATA. Is there some hack/workaround for this? I want to do this strictly in an XSLT.

Behead answered 25/4, 2012 at 20:34 Comment(2)
CDATA tells the XML parser that it is not XML, so it isn't parsed. <rant>It is often abused so that (lazy/uninformed) people creating "XML" through string concatenation don't have to deal with properly encoding characters. If you can control the creation of the XML file, or can influence the person producing it, get them to stop abusing CDATA and put their XML content as XML.</rant>Genotype
Possible duplicate of XSLT parse text node as XML?Tombstone
M
4

Since CDATA blocks are (part of) text nodes, you can extract the text between the two "tags", e.g. like this:

<xsl:template match="text()">
  <xsl:value-of select="substring-before(substring-after(., '&lt;name>'), '&lt;/name>')"/>
</xsl:template>

This is just a quick idea. If you have more than one name "element" inside the CDATA, just recursively apply the above expression multiple times.

Monjo answered 25/4, 2012 at 20:51 Comment(0)
B
7

Some XSLT products have an extension function, for example saxon:parse() that allow you to take a string containing lexical XML and convert it into a tree of nodes.

Buster answered 26/4, 2012 at 8:24 Comment(0)
G
6

You could also select out the CDATA section and then pass the result to a second XSL.

For instance if you get the CDATA section out like this:

<xsl:template match="//books/text()">
  <xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:template>

You would end up with a result like:

<genre><name>Sci-fi</name><count>2</count></genre>

which you could then apply another XSL to, or XPATH if dealing with just a DOM. That is assuming that your CDATA is always valid XML. Otherwise, the RegEx answer by Martin is the way.

Goer answered 25/4, 2012 at 21:32 Comment(0)
M
4

Since CDATA blocks are (part of) text nodes, you can extract the text between the two "tags", e.g. like this:

<xsl:template match="text()">
  <xsl:value-of select="substring-before(substring-after(., '&lt;name>'), '&lt;/name>')"/>
</xsl:template>

This is just a quick idea. If you have more than one name "element" inside the CDATA, just recursively apply the above expression multiple times.

Monjo answered 25/4, 2012 at 20:51 Comment(0)
M
1

Maybe my answer is coming way too late, but I'll give it anyway. I've run into the same problem and couldn't find an easy-to-use answer, so I wrote a template "STR2XML" myself to do the thing. If anyone is interested, I'm happy to share the template. Just let me know.

Two examples of how it works:

<xsl:variable name="text">
    <![CDATA[
        <div style="color:red;">
            <p>hello world</p>
        </div>
    ]]>
</xsl:variable>
<p>
    <xsl:value-of select="$text"/>
</p>
<xsl:call-template name="str2xml">
    <xsl:with-param name="text" select="$text"/>
</xsl:call-template>

Will give the following output:

<div style="font-weight:bold;"> <p>hello world</p> </div> (non parsed plain text)

hello world

But of course you can also use it to make a variable which can be accesses as a node:

<xsl:variable name="text2">
    <![CDATA[
        <div>hello world</div>
        <p>goodbye world</p>
    ]]>
</xsl:variable>
<xsl:variable name="var1">
    <xsl:call-template name="str2xml">
        <xsl:with-param name="text" select="$text2"/>
    </xsl:call-template>
</xsl:variable>
<xsl:for-each select="xalan:nodeset($var1)/*">
    <p>
        <xsl:value-of select="concat(name(.),': ',.)"/>
    </p>
</xsl:for-each>

Output:

div: hello world

p: good bye world

Mesosphere answered 2/8, 2014 at 2:34 Comment(1)
I would love to see your str2xml template if possibleKeshiakesia

© 2022 - 2024 — McMap. All rights reserved.