Validate XML using a custom DTD in PHP
Asked Answered
S

4

11

Is there a way (without installing any libraries) of validating XML using a custom DTD in PHP?

Sabinasabine answered 19/9, 2008 at 13:46 Comment(2)
So, just to clarify - does "custom DTD" mean "DTD which is independent/different from any DTD which may be specified in the content of the XML file"?Dorcia
See #1274673Mulligatawny
P
4

Take a look at PHP's DOM, especially DOMDocument::schemaValidate and DOMDocument::validate.

The example for DOMDocument::validate is fairly simple:

<?php
$dom = new DOMDocument;
$dom->Load('book.xml');
if ($dom->validate()) {
    echo "This document is valid!\n";
}
?>
Patric answered 19/9, 2008 at 13:50 Comment(6)
the only way to get the validation error is to use a custom error handler. really ugly. php sucks at error handlingJeffereyjefferies
uk3.php.net/manual/en/domdocument.schemavalidate.php#62032 looks like there is a better way than a custom error handlerJeffereyjefferies
@Andrei - It certainly helps to see validation errors displayed properly, so it is a win call libxml_use_internal_errors(true) before validation and libxml_get_errors() after a failure.Dorcia
@Patric - I don't think this really answers the original question because "book.xml" will simply be validated against whatever DTD is specified in the content of book.xml and not a "custom" DTD specified by the caller at runtime.Dorcia
FYI there is a bug in PHP with DOMDocument::validate() bugs.php.net/bug.php?id=48080Cellulose
This validates if an xml is in the correct format as an xml.. But I think he is asking for how to validate against custom ruls like DTD? I mean you need to check if the xml elements namings tags, correspond to the correct xml element names?Christmann
G
3

If you have the dtd in a string, you can validate against it by using a data wrapper for the dtd:

$xml = '<?xml version="1.0"?>
        <!DOCTYPE note SYSTEM "note.dtd">
        <note>
            <to>Tove</to>
            <from>Jani</from>
            <heading>Reminder</heading>
            <body>Don\'t forget me this weekend!</body>
        </note>';

$dtd = '<!ELEMENT note (to,from,heading,body)>
        <!ELEMENT to (#PCDATA)>
        <!ELEMENT from (#PCDATA)>
        <!ELEMENT heading (#PCDATA)>
        <!ELEMENT body (#PCDATA)>';


$root = 'note';

$systemId = 'data://text/plain;base64,'.base64_encode($dtd);

$old = new DOMDocument;
$old->loadXML($xml);

$creator = new DOMImplementation;
$doctype = $creator->createDocumentType($root, null, $systemId);
$new = $creator->createDocument(null, null, $doctype);
$new->encoding = "utf-8";

$oldNode = $old->getElementsByTagName($root)->item(0);
$newNode = $new->importNode($oldNode, true);
$new->appendChild($newNode);

if (@$new->validate()) {
    echo "Valid";
} else {
    echo "Not valid";
}
Gatekeeper answered 30/6, 2011 at 9:48 Comment(4)
So why does this code produce output "Not valid"? Trapping errors from libxml I see the following: <b>Error 517</b>: Could not load the external subset "data://text/plain;base64,PCFFTEVNRU5UIG5vdGUgKHRvLGZyb20saGVhZGluZyxib2R5KT4KICAgICAgICA8IUVMRU1FTlQgdG8gKCNQQ0RBVEEpPgogICAgICAgIDwhRUxFTUVOVCBmcm9tICgjUENEQVRBKT4KICAgICAgICA8IUVMRU1FTlQgaGVhZGluZyAoI1BDREFUQSk+CiAgICAgICAgPCFFTEVNRU5UIGJvZHkgKCNQQ0RBVEEpPg==" on line <b>0</b>Dorcia
I wish I could downvote this for broken code (or at least revoke my upvote).Dorcia
The problem which I'm having with the above code appears to be in the createDocumentType() call, which generates the DOCTYPE element. This is what I want (for the example): <!DOCTYPE note [<!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> ... <!ELEMENT body (#PCDATA)>]> but this is what I get: <!DOCTYPE note SYSTEM "data://text/plain;base64,PCFFTEVNRU5UIG5vdGUgKHRvLGZyb20saGVhZGluZyxib2R5KT4KICAgICAgICA8IUVMRU1FTlQgdG8gKCNQQ0RBVEEpPgogICAgICAgIDwhRUxFTUVOVCBmcm9tICgjUENEQVRBKT4KICAgICAgICA8IUVMRU1FTlQgaGVhZGluZyAoI1BDREFUQSk+CiAgICAgICAgPCFFTEVNRU5UIGJvZHkgKCNQQ0RBVEEpPg==">Dorcia
looks more like the code has been copied over from here and then combined with the data wrapper. @Peter: External subset loading might be disabled on your configuration, it does work.Dibbuk
D
3

My interpretation of the original question is that we have an "on board" XML file that we want to validate against an "on board" DTD file. So here's how I would implement the "interpolate a local DTD inside the DOCTYPE element" idea expressed in comments by both Soren and PayamRWD:

public function validate($xml_realpath, $dtd_realpath=null) {
    $xml_lines = file($xml_realpath);
    $doc = new DOMDocument;
    if ($dtd_realpath) {
        // Inject DTD inside DOCTYPE line:
        $dtd_lines = file($dtd_realpath);
        $new_lines = array();
        foreach ($xml_lines as $x) {
            // Assume DOCTYPE SYSTEM "blah blah" format:
            if (preg_match('/DOCTYPE/', $x)) {
                $y = preg_replace('/SYSTEM "(.*)"/', " [\n" . implode("\n", $dtd_lines) . "\n]", $x);
                $new_lines[] = $y;
            } else {
                $new_lines[] = $x;
            }
        }
        $doc->loadXML(implode("\n", $new_lines));
    } else {
        $doc->loadXML(implode("\n", $xml_lines));
    }
    // Enable user error handling
    libxml_use_internal_errors(true);
    if (@$doc->validate()) {
        echo "Valid!\n";
    } else {
        echo "Not valid:\n";
        $errors = libxml_get_errors();
        foreach ($errors as $error) {
            print_r($error, true);
        }
    }
}

Note that error handling has been suppressed for brevity, and there may be a better/more general way to handle the interpolation. But I have actually used this code with real data, and it works with PHP version 5.2.17.

Dorcia answered 12/9, 2011 at 14:15 Comment(0)
A
1

Trying to complete "owenmarshall" answer:

in xml-validator.php:

add html, header, body, ...

<?php

$dom = new DOMDocument; <br/>
$dom->Load('template-format.xml');<br/>
if ($dom->validate()) { <br/>
    echo "This document is valid!\n"; <br/>
}

?>

template-format.xml:

<?xml version="1.0" encoding="utf-8"?>

<!-- DTD to Validate against (format example) -->

<!DOCTYPE template-format [  <br/>
  <!ELEMENT template-format (template)>  <br/>
  <!ELEMENT template (background-color, color, font-size, header-image)>  <br/>
  <!ELEMENT background-color   (#PCDATA)>  <br/>
  <!ELEMENT color (#PCDATA)>  <br/>
  <!ELEMENT font-size (#PCDATA)>  <br/>
  <!ELEMENT header-image (#PCDATA)>  <br/>
]>

<!-- XML example -->

<template-format>

<template>

<background-color>&lt;/background-color>  <br/>
<color>&lt;/color>  <br/>
<font-size>&lt;/font-size>  <br/>
<header-image>&lt;/header-image>  <br/>

</template> 

</template-format>
Afrikander answered 3/3, 2011 at 16:11 Comment(2)
Same here, you don't load the DTD anywhere.Tendance
In his example he interpolated the DTD locally inside the DOCTYPE element (that's what Soren's code is trying to do, but it doesn't seem to work).Dorcia

© 2022 - 2024 — McMap. All rights reserved.