php DOMDocument adds <html> headers with DOCTYPE declaration
Asked Answered
W

5

9

I'm adding a #b hash to each link via the DOMDocument class.

        $dom = new DOMDocument();
        $dom->loadHTML($output);

        $a_tags = $dom->getElementsByTagName('a');

        foreach($a_tags as $a)
        {
            $value = $a->getAttribute('href');
            $a->setAttribute('href', $value . '#b');
        }

        return $dom->saveHTML();

That works fine, however the returned output includes a DOCTYPE declaration and a <head> and <body> tag. Any idea why that happens or how I can prevent that?

Word answered 26/3, 2011 at 18:54 Comment(2)
possible duplicate of PHP + DOMDocument: outerHTML for element?Datura
Possible duplicate of How to saveHTML of DOMDocument without HTML wrapper?Runway
A
5

That's what DOMDocument::saveHTML() generally does, yes : generate a full HTML Document, with the Doctype declaration, the <head> tag, ...

Two possible solutions :

  • If you are working with PHP >= 5.3, saveHTML() accepts one additional parameter that might help you
  • If you need your code to work with PHP < 5.3.6, you'll have to use some str_replace() or regex or whatever equivalent you can think of to remove the portions of HTML code you don't need.
    • For an example, see this note in the manual's users notes.
Araliaceous answered 26/3, 2011 at 19:3 Comment(4)
the second link works fine for me - preg_replace solution is the key! thank you!Word
You're welcome :-) (and the guys who post users notes on manual pages are more to be thanked than me, in this case ;-) )Araliaceous
I used the first option as I am using PHP >= 5.3 and it worked great. $doc->saveHTML(false);Quad
@BenSinclair I am also using PHP >= 5.3 and $doc->saveHTML(false) throws the error <b>Warning</b>: DOMDocument::saveHTML() expects parameter 1 to be DOMNode, boolean givenMestizo
H
6

The real problem is the way the DOM is loaded. Use this instead:

$html->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

Please upvote the original answer here.

Honorarium answered 9/12, 2015 at 0:40 Comment(0)
A
5

That's what DOMDocument::saveHTML() generally does, yes : generate a full HTML Document, with the Doctype declaration, the <head> tag, ...

Two possible solutions :

  • If you are working with PHP >= 5.3, saveHTML() accepts one additional parameter that might help you
  • If you need your code to work with PHP < 5.3.6, you'll have to use some str_replace() or regex or whatever equivalent you can think of to remove the portions of HTML code you don't need.
    • For an example, see this note in the manual's users notes.
Araliaceous answered 26/3, 2011 at 19:3 Comment(4)
the second link works fine for me - preg_replace solution is the key! thank you!Word
You're welcome :-) (and the guys who post users notes on manual pages are more to be thanked than me, in this case ;-) )Araliaceous
I used the first option as I am using PHP >= 5.3 and it worked great. $doc->saveHTML(false);Quad
@BenSinclair I am also using PHP >= 5.3 and $doc->saveHTML(false) throws the error <b>Warning</b>: DOMDocument::saveHTML() expects parameter 1 to be DOMNode, boolean givenMestizo
I
2

Adding $doc->saveHTML(false); will not work and it will return a error because it expects a node and not bool.

The solution I used:

return preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $doc->saveHTML()));

I`m using PHP >5.4

Inane answered 20/2, 2014 at 16:58 Comment(0)
A
0

I solved this problem by creating new DOMDocument and copying child nodes from original to new one.

function removeDocType($oldDom) {
  $node = $oldDom->documentElement->firstChild
  $dom = new DOMDocument();
  foreach ($node->childNodes as $child) {
    $dom->appendChild($doc->importNode($child, true));
  }
  return $dom->saveHTML();
}

So insted of using

return $dom->saveHTML();

I use:

return removeDocType($dom);
Aubry answered 30/3, 2016 at 11:17 Comment(0)
M
0

I was in the case where I want the html wrapper but not the DOCTYPE, the solution was in line with Tiago A.:

// Avoid adding the DOCTYPE header    
$dom->loadHTML($bodyContent, LIBXML_HTML_NODEFDTD);

// Avoid adding the DOCTYPE header AND html/body wrapper
$dom->loadHTML($bodyContent, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
Mortimer answered 17/9, 2021 at 11:44 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.