xmllint : validate an XML file against two XSD schemas (envelope / payload)
Asked Answered
T

3

23

I am using xmllint to do some validations and I have an XML instance document which needs to validate against two schemas: one for the outer "envelope" (which includes an any element) and one for the particular payload. Say A.xsd is the envelope schema, B.xsd a payload schema (there are different possible payloads) and ab.xml a valid XML instance document (I provide an example at the end of the post).

I have all three files locally available in the same directory and am using xmllint to perform the validation, providing as the schema argument the location of the outer (envelope) schema:

xmllint -schema A.xsd ab.xml

... yet, although I provide the location of both A.xsd and B.xsd in the instance document (using the xsi:schemaLocation element) xmllint fails to find it and complains:

ab.xml:8: element person: Schemas validity error : Element '{http://www.example.org/B}person': No matching global element declaration available, but demanded by the strict wildcard.
ab.xml fails to validate

So apparently xmllint is not reading the xsi:schemaLocation element. I understand that xmllint can be configured with catalogs but I failed to get xmllint to find both schemas. How should I get xmllint to take into account both schemas when validating the instance document or is there another command line utility or graphical tool I could use instead?

SSCCE

A.xsd - envelope schema

<?xml version="1.0" encoding="UTF-8"?>
<schema elementFormDefault="qualified" 
        xmlns               ="http://www.w3.org/2001/XMLSchema"
        xmlns:a             ="http://www.example.org/A"
        targetNamespace ="http://www.example.org/A">

       <element name="someType" type="a:SomeType"></element>

        <complexType name="SomeType">
            <sequence>
                <any namespace="##other" processContents="strict"/>
            </sequence>
        </complexType>
</schema>

B.xsd - payload schema

<?xml version="1.0" encoding="UTF-8"?>
<schema elementFormDefault="qualified"
    xmlns          ="http://www.w3.org/2001/XMLSchema"
    xmlns:b        ="http://www.example.org/B"
    targetNamespace="http://www.example.org/B"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <element name="person" type="b:PersonType"></element>
    <complexType name="PersonType">
        <sequence>
                <element name="firstName" type="string"/>
                <element name="lastName"  type="string"/>
        </sequence>
    </complexType>
  </schema>

ab.xml - instance document

<?xml version="1.0" encoding="UTF-8"?>
<a:someType xmlns:a="http://www.example.org/A"
        xmlns:b="http://www.example.org/B"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.example.org/A A.xsd
                            http://www.example.org/B B.xsd">

            <b:person>
                <b:firstName>Mary</b:firstName>
                <b:lastName>Bones</b:lastName>
            </b:person>

</a:someType>
Tourney answered 8/6, 2013 at 18:2 Comment(0)
T
12

I quit on xmllint and used Xerces instead.

I downloaded Xerces tarball and after exploding it to some local folder I created the following validate script based on this suggestion (from web archive - original link being now dead):

#!/bin/bash
XERCES_HOME=~/software-downloads/xerces-2_11_0/
echo $XERCES_HOME
java -classpath $XERCES_HOME/xercesImpl.jar:$XERCES_HOME/xml-apis.jar:$XERCES_HOME/xercesSamples.jar sax.Counter $*

The ab.xml file is then validated, against both schemas, with the following command:

 validate -v -n -np -s -f ab.xml

Xerces is reading the schema locations from the xsi:schemaLocation element in ab.xml so they don't need to be provided in the command line invocation.

Tourney answered 8/6, 2013 at 19:25 Comment(4)
This validated XML, but did not validate against XSDs. Anyway, here's the archived version of the broken link above: web.archive.org/web/20120827060501/http://www.diggsml.com/…Woodbury
@Woodbury It most assuredly validated both XML well-formed-ness and XSD syntactic validity. I tried it when I posted, I tried it again just now after your comment. E.g. if you rename (in the XML file) <b:firstName>Mary</b:firstName> to <b:firstName3>Mary</b:firstName3>, the XML is still well-formed but no longer XSD-valid. The script duly spots that: [Error] ab.xml:9:23: cvc-complex-type.2.4.a: Invalid content was found starting with element 'b:firstName3'. One of '{"http://www.example.org/B":firstName}' is expected.Tourney
You're right, it does check XSD, in general. I did exactly such a change with the root element of my XML file (an error that xmllint caught but xerces didn't). I now tried it with an attribute in the same file and the error was correctly reported. I'm using libxerces2-java 2.11.0-7. I guess I'll stay with xmlllint.Woodbury
I'm getting /tmp/xerces/ [Fatal Error] :-1:-1: Premature end of file.. Seems like xerces is getting an empty file, but the xml isn't empty. Not sure how to debug this. Xerces 2.12.0Woodbury
I
15

You can create a wrapper schema and import both namespaces. AB.xsd:

<?xml version="1.0" encoding="UTF-8"?>
<schema elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema">
    <import namespace="http://www.example.org/A" schemaLocation="A.xsd"/>
    <import namespace="http://www.example.org/B" schemaLocation="B.xsd"/>
</schema>

Then:

xmllint --schema AB.xsd ab.xml
<?xml version="1.0" encoding="UTF-8"?>
<a:someType xmlns:a="http://www.example.org/A" xmlns:b="http://www.example.org/B" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.org/A A.xsd                             http://www.example.org/B B.xsd">

            <b:person>
                <b:firstName>Mary</b:firstName>
                <b:lastName>Bones</b:lastName>
            </b:person>

</a:someType>
ab.xml validates
Imprimatur answered 20/5, 2015 at 4:34 Comment(1)
i love this answer as it allows me to use xmllint that's installed everywhere while not changing the xml or the existing schema files.Geum
T
12

I quit on xmllint and used Xerces instead.

I downloaded Xerces tarball and after exploding it to some local folder I created the following validate script based on this suggestion (from web archive - original link being now dead):

#!/bin/bash
XERCES_HOME=~/software-downloads/xerces-2_11_0/
echo $XERCES_HOME
java -classpath $XERCES_HOME/xercesImpl.jar:$XERCES_HOME/xml-apis.jar:$XERCES_HOME/xercesSamples.jar sax.Counter $*

The ab.xml file is then validated, against both schemas, with the following command:

 validate -v -n -np -s -f ab.xml

Xerces is reading the schema locations from the xsi:schemaLocation element in ab.xml so they don't need to be provided in the command line invocation.

Tourney answered 8/6, 2013 at 19:25 Comment(4)
This validated XML, but did not validate against XSDs. Anyway, here's the archived version of the broken link above: web.archive.org/web/20120827060501/http://www.diggsml.com/…Woodbury
@Woodbury It most assuredly validated both XML well-formed-ness and XSD syntactic validity. I tried it when I posted, I tried it again just now after your comment. E.g. if you rename (in the XML file) <b:firstName>Mary</b:firstName> to <b:firstName3>Mary</b:firstName3>, the XML is still well-formed but no longer XSD-valid. The script duly spots that: [Error] ab.xml:9:23: cvc-complex-type.2.4.a: Invalid content was found starting with element 'b:firstName3'. One of '{"http://www.example.org/B":firstName}' is expected.Tourney
You're right, it does check XSD, in general. I did exactly such a change with the root element of my XML file (an error that xmllint caught but xerces didn't). I now tried it with an attribute in the same file and the error was correctly reported. I'm using libxerces2-java 2.11.0-7. I guess I'll stay with xmlllint.Woodbury
I'm getting /tmp/xerces/ [Fatal Error] :-1:-1: Premature end of file.. Seems like xerces is getting an empty file, but the xml isn't empty. Not sure how to debug this. Xerces 2.12.0Woodbury
A
5

If you had an import element in your A.xsd, right after opening the schema tag,

<xsd:import namespace="http://www.example.org/B" schemaLocation="B.xsd"/>

then you could pass A.xsd to xmllint and it would work with:

xmllint -schema A.xsd ab.xml
Athelstan answered 21/11, 2013 at 18:27 Comment(1)
I didn't want to modify the schemas at all. A.xsd is the "envelope" schema so it's agnostic as to the possible schemas of the various payloads (of which B.xsd is just one possibility).Tourney

© 2022 - 2024 — McMap. All rights reserved.