Prevent PHP DOMDocument from removing @click attributes
Asked Answered
I

2

8

I have an HTML code where there are attributes like @click, @autocomplete:change used by some JS libraries.

When I parse the HTML using DOMDocument, these attributes are removed.

Sample code:

<?php

$content = <<<'EOT'
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
    <head></head>
    <body>
        <a role="tab" @click="activeType=listingType"></a>
        <input type="text" @autocomplete:change="handleAutocomplete">
    </body>
</html>
EOT;

// creating new document
$doc = new DOMDocument('1.0', 'utf-8');
$doc->recover = true;
$doc->strictErrorChecking = false;

//turning off some errors
libxml_use_internal_errors(true);

// it loads the content without adding enclosing html/body tags and also the doctype declaration
$doc->LoadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

echo $doc->saveHTML();
?>

Output:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
    <head></head>
    <body>
        <a role="tab"></a>
        <input type="text">
    </body>
</html>
Immortelle answered 18/11, 2021 at 4:22 Comment(12)
It seems DOMDocument will exclude non-standard attributes, and suppressing warnings won't stop from doing so. You are probably off using a 3rd party parsing engine.Jail
Are you using DOMDocument to try to clean up bad HTML?Embowel
@Jail that's right. However I found't any other parsing that's as fast DOMDocument and consumes less resources.Immortelle
@ChrisStrickland no, to parse HTMLImmortelle
But why do you need to parse that HTML?Vassalize
@Vassalize it's a speed optimization solution for WordPress. Manipulates HTML is different waysImmortelle
And I suppose you don't want to recompile the php environment to omit the xmlValidateName check in dom/attr.cEmbowel
@ChrisStrickland this code will be deployed to different hosting providers of my clients. So I don't have access to their server to recompile.Immortelle
I looked at the sourcecode of libxml, it checks if attribute starts with letters, _, and :... and fails otherwise :(Ozonosphere
@SalmanA Can you point me to the source code that check?Immortelle
Unfortunately attribute names starting with an @ character are allowed by the HTML5 spec. Which makes DOMDocument a bad fit for your problem, because it relies on libxml which is not a formal HTML5 parser. You might want to have a look at other parsers before you start to build more hacks.Brainard
@Brainard the other parsers you pointed also uses DOMDocument. Unfortunately there aren't good parsers in PHP that are fast and reliable.Immortelle
E
4

If there's no way to make DOMDocument accept @ in attribute names, we can replace @ with a special string before loadHTML(), and replace back after saveHTML()

<?php

$content = <<<'EOT'
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
    <head></head>
    <body>
        <a role="tab" @click="activeType=listingType"></a>
        <input type="text" @autocomplete:change="handleAutocomplete">
    </body>
</html>
EOT;

// creating new document
$doc = new DOMDocument('1.0', 'utf-8');
$doc->recover = true;
$doc->strictErrorChecking = false;

//turning off some errors
libxml_use_internal_errors(true);

$content = str_replace('@', 'at------', $content);
// it loads the content without adding enclosing html/body tags and also the doctype declaration
$doc->LoadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$html = $doc->saveHTML();
$html = str_replace('at------', '@', $html);
echo $html;

output:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
    <head></head>
    <body>
        <a role="tab" @click="activeType=listingType"></a>
        <input type="text" @autocomplete:change="handleAutocomplete">
    </body>
</html>
Eirena answered 20/11, 2021 at 10:31 Comment(3)
nice, but I don't think you can actually use getAttribute on @click, for instance so if he's trying to parse the value out I don't think this will work.Embowel
This will fail if the HTML already contains instances of "at------" (e.g. as node text) because they will inadvertently get replaced with "@". Better would be to determine a string that is not present in the document, and use that as a sentinel value.Jail
This is only "hacky" solution I found :|Immortelle
K
3

Extend the DomDocument class

And replace @click with at-click and @autocomplete with at-autocomplete.

# this is a PHP 8 example 
class MyDomDocument extends DomDocument 
{
    private $replace = [
        '@click'=>'at-click',
        '@autocomplete'=>'at-autocomplete'
    ];

    public function loadHTML(string $content, int $options = 0)
    {
        $content = str_replace(array_keys($this->replace), array_values($this->replace), $content);
        return parent::loadHTML($content, $options);
    }

    #[\ReturnTypeWillChange]
    public function saveHTML(?DOMNode $node = null)
    {
        $content = parent::saveHTML($node);
        $content = str_replace(array_values($this->replace), array_keys($this->replace), $content);
        return $content;
    }
}

Example

$content = <<<'EOT'
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
    <head></head>
    <body>
        <a role="tab" @click="activeType=listingType"></a>
        <input type="text" @autocomplete:change="handleAutocomplete">
    </body>
</html>
EOT;

$dom = new MyDomDocument();
$dom->loadHTML($content);

var_dump($dom->getElementsByTagName('a')[0]->getAttribute('at-click'));
var_dump($dom->getElementsByTagName('input')[0]->getAttribute('at-autocomplete:change'));

echo $dom->saveHTML();

Output

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
    <head></head>
    <body>
        <a role="tab" @click="activeType=listingType"></a>
        <input type="text" @autocomplete:change="handleAutocomplete">
    </body>
</html>
Kairouan answered 22/11, 2021 at 14:11 Comment(1)
Unfortunately, I can't use this solution as DOMDocument is used in an internal library.Immortelle

© 2022 - 2025 — McMap. All rights reserved.