DOMDocument::loadHTML error
Asked Answered
C

5

69

I build a script that combines all css on a page together to use it in my cms. It worked fine for a long time now i i get this error:


Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag header invalid in Entity, line: 10 in css.php on line 26

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag nav invalid in Entity, line: 10 in css.php on line 26

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag section invalid in Entity, line: 22 in css.php on line 26

This is the php script

This is my code:

<?php
header('Content-type: text/css');
include ('../global.php');

if ($usetpl == '1') {
    $client = New client();
    $tplname = $client->template();
    $location = "../templates/$tplname/header.php";
    $page = file_get_contents($location);
} else {
    $page = file_get_contents('../index.php');
}

class StyleSheets extends DOMDocument implements IteratorAggregate
{

    public function __construct ($source)
    {
        parent::__construct();
        $this->loadHTML($source);
    }

    public function getIterator ()
    {
        static $array;
        if (NULL === $array) {
            $xp = new DOMXPath($this);
            $expression = '//head/link[@rel="stylesheet"]/@href';
            $array = array();
            foreach ($xp->query($expression) as $node)
                $array[] = $node->nodeValue;
        }
        return new ArrayIterator($array);
    }
}

foreach (new StyleSheets($page) as $index => $file) {
    $css = file_get_contents($file);
    echo $css;
}
Catboat answered 5/2, 2012 at 12:20 Comment(1)
This issue had been reported for PHP at bugs.php.net/bug.php?id=60021 which in turn spawned a feature request in the underlying libxml2: bugzilla.gnome.org/show_bug.cgi?id=761534Tentative
H
177

Header, Nav and Section are elements from HTML5. Because HTML5 developers felt it is too difficult to remember Public and System Identifiers, the DocType declaration is just:

<!DOCTYPE html>

In other words, there is no DTD to check, which will make DOM use the HTML4 Transitional DTD and that doesnt contain those elements, hence the Warnings.

To surpress the Warnings, put

libxml_use_internal_errors(true);

before the call to loadHTML and

libxml_use_internal_errors(false);

after it.

An alternative would be to use https://github.com/html5lib/html5lib-php.

Huckaby answered 5/2, 2012 at 12:31 Comment(4)
did that, now i get a blank pageCatboat
@Catboat that is another problem ! Gordon has the good answer, thanks !Eligibility
@Gordan how do you fix the blank page issue?Ruffi
I had the same blank-page issue. My mistake was using print $document->saveXML() instead of $document->saveHTML(). The HTML version doesn't make certain formatting conversions that the XML version does. If that's not the issue, try checking the source of the output to see what tags, if any, are present. It should clue you in to what's happening under the hood. Also, don't forget var_dump!Bohs
L
26

With a DOMDocument object, you should be able to place an @ before the load method in order to SUPPRESS all WARNINGS.

$dom = new DOMDocument;
@$dom->loadHTML($source);

And carry on.

Lanthanum answered 31/10, 2018 at 22:4 Comment(1)
This is a terrible solution as you will make errors on this line a nightmare to debug. @Gordon's solution is much better.Champaigne
N
15

HTML5 elements are still not supported, but you can silence libxml errors completely with the $options parameter.

Just set

$doc = new DOMDocument();
$doc->loadHTMLFile("html5.html", LIBXML_NOERROR);

This option is preferred over @ which silences PHP errors.

But be careful, libxml is very forgiving and it will parse a broken HTML document. If you silence libxml errors you might not even be aware that the HTML is malformed.

Natterjack answered 21/10, 2020 at 21:18 Comment(2)
Are there any options to silence only errors, which were thrown because of HTML 5 elements?Olympiaolympiad
@Olympiaolympiad Not to my knowledgeNatterjack
S
1

Most people do not realize the difference between HTML and XML as languages and HTML and XML in regards to parsers. A parser takes code and the HTML and XML parsers are completely different. While there are some minor things XML parsers will tolerate in browsers (e.g. duplicate id values) they don't mess around with junk that looks like code.

PHP's XML parser is even stricter and doesn't allow duplicate id values. Additionally since anything can be an element (e.g. footer, header, section) PHP's XML parser will not complain about unknown HTML5+ elements.

$dom->loadXML($xml);

For anyone developing on client side I highly recommend using the XML parser to handle your HTML5 code and since I started developing in the 2000s in to 2020 Gecko browsers (e.g. Waterfox, Firefox) have the best XML parser as the entire page will break and you'll get an explicit error message. Stricter code yields better results if you can comprehend quality eventually yields quantity though the opposite is not true.

Svetlana answered 1/11, 2020 at 2:30 Comment(0)
W
0

Instead of using DOMDocument you might want to use this comfortable DomCralwer component from the Symfony:

https://symfony.com/doc/current/components/dom_crawler.html

composer require symfony/dom-crawler

Then you can do cool stuff like

$crawler = new Crawler($html);
$crawler->filter(".whatever .wild > .query  ~.you[name=it]")->each(function($node, $i){
    print_r($node->text());

    //or something like this
    $node->children()->each(function($node_inner, $j){
        ...
    });
    ...
});
Wrench answered 10/6, 2023 at 12:51 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.