How can I prevent html entities with PHP a DOMDocument::saveHTML()?
Asked Answered
L

2

4

Due to custom storage needs (the "why" is not important here, thanks!) I have to save html <a> links in a specific format such as this:

myDOMNode->setAttribute("href", "{{{123456}}}");

Everything works fine until i call saveHTML() on the containing DOMDocument. This kills it, since it encodes { in %7B.

This is a legacy application where href="{{{123456}}}" works as a placeholder. The command-line parser look for this pattern exactly (unencoded) and cannot be changed.

I've no choice but to do it this way.

I cannot htmldecode() the result.

This HTML will never be displayed as this, it is just a storage need.

Thanks for your help!

Note: I've looked around for 2 hours but none of the proposed solution worked for me. For those who will blindly mark the question as duplicate: please comment and let me know.

Lallans answered 4/4, 2015 at 15:25 Comment(8)
Indeed it does encode it. But this doesn't explain what you mean by 'this kills it'. It's a URL, they can be encode, no problem. Just so you can see I'm not kidding, I made an example: ergobase.nl/test25.html See the source code, it's encode, click the link, it works! Wow. So please tell me what you mean by 'this kills it'?Gardia
You're right, the link would work, but this is a legacy application where href="{{{123456}}}" works as a placeholder. The command-line parser look for this pattern exactly (unencoded) and cannot be changed.Lallans
What happens if you use html_entity_decode() after the saveHTML() and before you send it to the command line parser?Gardia
As I said cannot do that (the page has other encoded entities that must be left untouched)Lallans
You said it is a legacy application; what PHP version are you using?Salesin
PHP 5.6.7 (the "legacy" is in the whole context only, not on the application itself)Lallans
I know this is from years ago, but by any chance did you find any solutions for it? sadly this is still an issue today.Highmuckamuck
@MacA. . Try the accepted solution. Back in the day I did it like that.Lallans
S
4

As the legacy code is using {{{...}}} as a placeholder, it may be safe to use a somewhat hackish approach with preg_replace_callback. The following will restore the URL encoded placeholders once the HTML is generated:

$src = <<<EOS
<html>
    <body>
        <a href="foo">Bar</a>
   </body>
</html>
EOS;

// Create DOM document
$dom = new DOMDocument();
$dom->loadHTML($src);

// Alter `href` attribute of anchor
$a = $dom->getElementsByTagName('a')
    ->item(0)
    ->setAttribute('href', '{{{123456}}}');

// Callback function to URL decode match
$urldecode = function ($matches) {
    return urldecode($matches[0]);
};

// Turn DOMDocument into HTML string, then restore/urldecode placeholders 
$html = preg_replace_callback(
    '/' . urlencode('{{{') . '\d+' . urlencode('}}}') . '/',
    $urldecode,
    $dom->saveHTML()
);

echo $html, PHP_EOL;

Output (indented for clarity):

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
    <body>
        <a href="{{{123456}}}">Bar</a>
    </body>
</html>
Salesin answered 7/4, 2015 at 11:17 Comment(0)
R
0

I came across this same issue recently and this was my solution. Little bit hacky, but you can do this,

$customTempAttributeName='vygjhvgjvgkf';

//$node is your a tag DOM node

$newAttr = $dom->createAttribute($customTempAttributeName);
$newAttr->value = "{{your_placeholder}}";
$node->setAttributeNode($newAttr);
$node->removeAttribute('href');

//Then replace custom dom node with href
$finalHTMLString = $dom->saveHTML();
$finalHTMLString = str_replace($customTempAttributeName,'href',$finalHTMLString);

echo $finalHTMLString;

// Please don't forget to upvote my answer ;)
Raquelraquela answered 18/6, 2023 at 11:49 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.