I have a string value that I'm trying to extract list items for. I'd like to extract the text and any subnodes, however, DOMDocument is converting the entities to the character, instead of leaving in the original state.
I've tried setting DOMDocument::resolveExternals and DOMDocument::substituteEntities for false, but this has no effect. It should be noted I'm running on Win7 with PHP 5.2.17.
Example code is:
$example = '<ul><li>text</li>'.
'<li>½ of this is <strong>strong</strong></li></ul>';
echo 'To be converted:'.PHP_EOL.$example.PHP_EOL;
$doc = new DOMDocument();
$doc->resolveExternals = false;
$doc->substituteEntities = false;
$doc->loadHTML($example);
$domNodeList = $doc->getElementsByTagName('li');
$count = $domNodeList->length;
for ($idx = 0; $idx < $count; $idx++) {
$value = trim(_get_inner_html($domNodeList->item($idx)));
/* remainder of processing and storing in database */
echo 'Saved '.$value.PHP_EOL;
}
function _get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}
½
ends up getting converted to ½ (single character / UTF-8 version, not entity version), which is not the desired format.
$example
string into theDOMDocument
? – Groin$doc->saveHTML( new DOMNode('½') );
– FrustratesaveHTML(DOMNode $node)
in 5.3.8 and it still translates the entity. – Groin$doc->saveHTML( new DOMText('½') )
– Frustrate