PHP DOM append HTML to existing document without DOMDocumentFragment::appendXML
Asked Answered
M

1

6

I need to load some arbitrary HTML into an existing DOMDocument tree. Previous answers suggest using DOMDocumentFragment and its appendXML method to handle this.

As @Owlvark indicates in the comments, xml is not html and therefore this is not a good solution.

The main issue that I had with it was that entities like &ndash were causing errors because the appendXML method expects well formed XML.

We could define the entities, but this doesn't take care of the problem that not all html is valid xml.

What is a good solution for importing HTML into a DOMDocument tree?

Metcalf answered 11/9, 2012 at 19:34 Comment(6)
You might just have to turn on libxml_use_internal_errors() and ignore it... Also, you're loading the document using DomDocument::loadHtml() right?Undershoot
@FrankFarmer, the internal errors just suppresses the errors visually or from your error handler, it does nothing to actually resolve the issue. As for loadHtml, I am not. I am using the DOMDocumentFragment::appendXMLMetcalf
See this answer - HTML is not XMLHabilitate
@Habilitate joy, that explains the error... but it also doesn't provide a viable solution.Metcalf
You have been given two "solutions" (suppressing errors, defining entities), what makes them not "viable"?..Clotildecloture
@Clotildecloture I don't view suppressing errors as a solution so much as a hack, but I guess it depends on your point of view. The defining entities seems to be out of the way, but yes it is viable. Thanks FrankFarmer and Owlvark for your contributions!Metcalf
M
7

The solution that I came up with is to use DomDocument::loadHtml as @FrankFarmer suggests and then to take the parsed nodes and import them into my current document. My implementation looks like this

/**
* Parses HTML into DOMElements
* @param string $html the raw html to transform
* @param \DOMDocument $doc the document to import the nodes into
* @return array an array of DOMElements on success or an empty array on failure
*/
protected function htmlToDOM($html, $doc) {
     $html = '<div id="html-to-dom-input-wrapper">' . $html . '</div>';
     $hdoc = DOMDocument::loadHTML($html);
     $child_array = array();
     try {
         $children = $hdoc->getElementById('html-to-dom-input-wrapper')->childNodes;
         foreach($children as $child) {
             $child = $doc->importNode($child, true);
             array_push($child_array, $child);
         }
     } catch (Exception $ex) {
         error_log($ex->getMessage(), 0);
     }
     return $child_array;
 }
Metcalf answered 11/9, 2012 at 20:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.