Decode CDATA section in C#
Asked Answered
T

5

11

I have a bit of XML as follows:

<section>
  <description>
    <![CDATA[
      This is a "description"
      that I have formatted
    ]]>
  </description>
</section>

I'm accessing it using curXmlNode.SelectSingleNode("description").InnerText but the value returns

\r\n      This is a "description"\r\n      that I have formatted
instead of
This is a "description" that I have formatted.

Is there a simple way to get that sort of output from a CDATA section? Leaving the actual CDATA tag out seems to have it return the same way.

Turkey answered 6/8, 2009 at 3:8 Comment(0)
P
18

You can use Linq to read CDATA.

XDocument xdoc = XDocument.Load("YourXml.xml");
xDoc.DescendantNodes().OfType<XCData>().Count();

It's very easy to get the Value this way.

Here's a good overview on MSDN: http://msdn.microsoft.com/en-us/library/bb308960.aspx

for .NET 2.0, you probably just have to pass it through Regex:

     string xml = @"<section>
                      <description>
                        <![CDATA[
                          This is a ""description""
                          that I have formatted
                        ]]>
                      </description>
                    </section>";

        XPathDocument xDoc = new XPathDocument(new StringReader(xml.Trim()));
        XPathNavigator nav = xDoc.CreateNavigator();
        XPathNavigator descriptionNode = 
            nav.SelectSingleNode("/section/description");

        string desiredValue = 
            Regex.Replace(descriptionNode.Value
                                     .Replace(Environment.NewLine, String.Empty)
                                     .Trim(),
                @"\s+", " ");

that trims your node value, replaces newlines with empty, and replaces 1+ whitespaces with one space. I don't think there's any other way to do it, considering the CDATA is returning significant whitespace.

Pelf answered 6/8, 2009 at 3:16 Comment(4)
Thanks, but I should have been more specific that I'm doing this in 2.0 on the Compact Framework. I might look into seeing if it'd be more advantageous to move to 3.5 in the future though.Turkey
I edited with another idea. I don't have .NET 2.0 CF installed though, so I'm not 100% sure it's compatible.Pelf
@Jim Schubert Did you mean to include parentheses after "DescendantNodes", for example: "xDoc.DescendantNodes().OfType<XCData>().Count();"?Starlin
@Anthony: sure did, sir! Thanks for pointing it out. I've updated the answer with the correction!Pelf
A
11

I think the best way is...

XmlCDataSection cDataNode = (XmlCDataSection)(doc.SelectSingleNode("section/description").ChildNodes[0]);

string finalData = cDataNode.Data;
Armagnac answered 30/8, 2011 at 9:15 Comment(1)
Definitely the best solution, short, no string conversions involved and using the System.Xml existing methods.Olfactory
I
9

Actually i think is pretty much simple. the CDATA section it will be loaded in the XmlDocument like another XmlNode the difference is that this node is going to has the property NodeType = CDATA, wich it mean if you have the XmlNode node = doc.SelectSingleNode("section/description"); that node will have a ChildNode with the InnerText property filled the pure data, and there is you want to remove the especial characters just use Trim() and you will have the data.

The code will look like

XmlNode cDataNode = doc.SelectSingleNode("section/description").ChildNodes[0];
string finalData = cDataNode.InnerText.Trim();

Thanks
XOnDaRocks

Incunabula answered 7/5, 2010 at 15:28 Comment(0)
H
4

A simpler form of @Franky's solution:

doc.SelectSingleNode("section/description").FirstChild.Value

The Value property is equivalent to the Data property of the casted XmlCDataSection type.

Hemp answered 11/1, 2017 at 14:35 Comment(1)
Value is fine here as well;Rifling
V
3

CDATA blocks are effectively verbatim. Any whitespace inside CDATA is significant, by definition, according to XML spec. Therefore, you get that whitespace when you retrieve the node value. If you want to strip it using your own rules (since XML spec doesn't specify any standard way of stripping whitespace in CDATA), you have to do it yourself, using String.Replace, Regex.Replace etc as needed.

Vimen answered 6/8, 2009 at 5:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.