PHP: How to handle <![CDATA[ with SimpleXMLElement?
Asked Answered
H

6

119

I noticed that when using SimpleXMLElement on a document that contains those CDATA tags, the content is always NULL. How do I fix this?

Also, sorry for spamming about XML here. I have been trying to get an XML based script to work for several hours now...

<content><![CDATA[Hello, world!]]></content>

I tried the first hit on Google if you search for "SimpleXMLElement cdata", but that didn't work.

Hawker answered 3/6, 2010 at 23:48 Comment(3)
How are you trying to access the node value? And, is SimpleXML a requirement?Cymograph
I tried every other function (xml2array and all) that I could find on the web and SimpleXML seems to be the only one that gives GOOD results, except for the CDATA not working.Hawker
We do a lot of XML parsing at work using DOMDocument (php.net/manual/en/class.domdocument.php). It works just fine in handling CDATA. Give that a short or post a little more code for us to see how you're working with SimpleXML.Cymograph
N
205

You're probably not accessing it correctly. You can output it directly or cast it as a string. (in this example, the casting is superfluous, as echo automatically does it anyway)

$content = simplexml_load_string(
    '<content><![CDATA[Hello, world!]]></content>'
);
echo (string) $content;

// or with parent element:

$foo = simplexml_load_string(
    '<foo><content><![CDATA[Hello, world!]]></content></foo>'
);
echo (string) $foo->content;

You might have better luck with LIBXML_NOCDATA:

$content = simplexml_load_string(
    '<content><![CDATA[Hello, world!]]></content>'
    , null
    , LIBXML_NOCDATA
);
Nonagenarian answered 4/6, 2010 at 0:13 Comment(13)
No, PHP skips CDATA completely for some reason. Any other ideas?Hawker
Then it's a bug. Upgrade PHP/libxml until it works (I've never had any problems with CDATA and SimpleXML.) You may want to try your luck with LIBXML_NOCDATA otherwise.Nonagenarian
Right on. Without the LIBXML_NOCDATA, the XML comes in as false - regardless of how it's done. I was able to prove that out with both creation methods... $x = new SimpleXMLElement('<content><![CDATA[Hello, world!]]></content>', LIBXML_NOCDATA); $y = simplexml_load_string('<content><![CDATA[Hello, world!]]></content>', "SimpleXMLElement", LIBXML_NOCDATA); print_r($y); Without that option, they're both null. Just wanted to back up your assertion.Cymograph
LIBXML_NOCDATA should be the second parameter, not the third! Otherwise it works fine, +1.Lisandra
I know this is an old answer, but I would like to stress that the first part of this answer is correct. When you print the result with print_r you are indeed not accessing it correctly. Write the code you actually want - probably with echo, or with a (string) cast, and you will find the content is fine. Do not use LIBXML_NOCDATA it is irrelevant.Eunaeunice
While debugging an application, var_dump'ing a SimpleXMLElement containing CDATA's doesn't show nodes content. But var_dump'ing this did the job: simplexml_load_string($simplexml->asXML(), null, LIBXML_NOCDATA)Gibun
@Eunaeunice Adding LIBXML_NOCDATA (and changing nothing else) works, so I'm not so sure it is irrelevant.Benzo
@SimonePalazzo Adding LIBXML_NOCDATA fixes print_r and var_dump output, yes. It does not fix any code you should actually be using in production, because whenever you actually try to use that string, you'll find that the CDATA was there all along.Eunaeunice
@Eunaeunice Well, then it's not working for me :) I'm using simplexml_load_string + convert object to array + edit array + convert back to xml, and without LIBXML_NOCDATA it does not work, i.e. the corresponding field is empty (don't know if null or empty string).Benzo
@SimonePalazzo Your mistake is converting the SimpleXML object to an array - that's not what SimpleXML is designed for. You should be using foreach, ->element, ['attribute'], etc on the SimpleXML object itself. See: php.net/manual/en/simplexml.examples-basic.php Or alternatively, you should be using a different parser to produce an array more suited to your needs. Or using the DOM interface, which has better editing functions.Eunaeunice
@Eunaeunice I see... but why does LIBXML_NOCDATA help then?Benzo
@SimonePalazzo XML consists of various different "nodes" - e.g. <anElement>a text node <aChildElement /> <![CDATA a cdata node]]> another text node</anElement>. The CDATA and text nodes are different types, and SimpleXML tracks this so you can get back the XML you put in. When you squeeze a SimpleXML object into an array, it throws away a lot of information - CDATA nodes, comments, any element not in the current namespace (e.g. <someNSPrefix:someElement />), the position of the child element in the text, etc. LIBXML_NOCDATA converts CDATA nodes into text nodes, but doesn't fix the rest.Eunaeunice
For full reference: Predefined ConstantsSemirigid
S
70

The LIBXML_NOCDATA is optional third parameter of simplexml_load_file() function. This returns the XML object with all the CDATA data converted into strings.

$xml = simplexml_load_file($this->filename, 'SimpleXMLElement', LIBXML_NOCDATA);
echo "<pre>";
print_r($xml);
echo "</pre>";


Fix CDATA in SimpleXML

Seawards answered 25/3, 2014 at 6:30 Comment(2)
LIBXML_NOCDATA is what made this work for me. PHP 5.3.5Jorgensen
Your answer is the one that explains the LIBXML_NOCDATA meaning, thanks!Semirigid
P
15

This is working perfect for me.

$content = simplexml_load_string(
    $raw_xml
    , null
    , LIBXML_NOCDATA
);
Pocky answered 25/1, 2016 at 10:5 Comment(0)
F
14

This did the trick for me:

echo trim($entry->title);
Furrow answered 24/11, 2012 at 12:8 Comment(1)
Perfect if you need to keep the cdata (without LIBXML_NOCDATA)Hertzfeld
T
1

When to use LIBXML_NOCDATA ?

I add the issue when transforming XML to JSON.

$xml = simplexml_load_string("<foo><content><![CDATA[Hello, world!]]></content></foo>");
echo json_encode($xml, true); 
/* prints
   {
     "content": {}
   }
 */

When accessing the SimpleXMLElement object, It gets the CDATA :

$xml = simplexml_load_string("<foo><content><![CDATA[Hello, world!]]></content></foo>");
echo $xml->content; 
/* prints
   Hello, world!
*/

I makes sense to use LIBXML_NOCDATA because json_encode don't access the SimpleXMLElement to trigger the string casting feature, I'm guessing a __toString() equivalent.

$xml = simplexml_load_string("<foo><content><![CDATA[Hello, world!]]></content></foo>", null, LIBXML_NOCDATA);
echo json_encode($xml);
/*
 {
   "content": "Hello, world!"
 }
*/
Trichloromethane answered 17/4, 2020 at 7:19 Comment(0)
B
0

While using SimpleXMLElement class directly

new SimpleXMLElement($rawXml, LIBXML_NOCDATA);
Butacaine answered 11/3, 2022 at 13:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.