Since libxml 2.9, loading external entities has been disabled when parsing XML, to prevent XXE attacks.
In that case, to be able to load a DTD file when parsing the XML with PHP's DOMDocument, LIBXML_DTDLOAD
must be specified.
What would be a good way to verify that only the expected DTD will be loaded, before enabling LIBXML_DTDLOAD
?
One approach I can think of (as shown in the example code below) would be to keep entity loading disabled, parse the XML file once, check that the DOCTYPE declaration is as expected, then parse the XML again with entity loading enabled. Would that be sufficient?
<?php
$xml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd">
<article/>
XML;
// entity loading disabled
libxml_disable_entity_loader();
$doc = new DOMDocument;
$doc->loadXML($xml, LIBXML_DTDLOAD); // PHP Warning: DOMDocument::load(): I/O warning : failed to load external entity
print $doc->doctype->systemId; // http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd
// entity loading enabled
libxml_disable_entity_loader(false);
$doc = new DOMDocument;
$doc->loadXML($xml, LIBXML_DTDLOAD);
print $doc->doctype->systemId; // http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd
LIBXML_DTDLOAD
? Have you tested? Also please provide example data and code in your question. We need an example here for reproduction and clarity - at least if you want a sufficient and clear answer. According to my tests in the linked question, those aren't loaded regardless of your setting. But I'm not sure about the stability of that test. – Catamenia