I'm working on a tool to help a user author XHTML-ish documents which are similar in nature to JSP files. The documents are XML and can contain any well-formed tags in the XHTML namespace, and weaved between them are elements from my product's namespace. Among other things, the tool validates the input using XSD.
Example input:
<?xml version="1.0"?>
<markup>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:c="https://my_tag_lib.example.com/">
<c:section>
<c:paragraph>
<span>This is a test!</span>
<a href="http://www.google.com/">click here for more!</a>
</c:paragraph>
</c:section>
</html>
</markup>
My problem is that the XSD validation doesn't behave consistently depending on how deeply I nest elements. What I want is for all elements in the https://my_tag_lib.example.com/
namespace to be checked against the schema while any elements in namespace http://www.w3.org/1999/xhtml
to be liberally tolerated. I would like to not list all HTML elements which are permitted in my XSD - users may want to use obscure elements only available on certain browsers etc. Instead I'd just like to white list any element belonging to the namespace using <xs:any>
.
What I'm discovering is that under some circumstances, elements which belong to the my_tag_lib
namespace but don't appear in the schema are passing validation, while other elements which do appear in the schema can be made to fail by giving them invalid attributes.
So: * valid elements are validated against the XSD schema * invalid elements are skipped by the validator?
For example, this passes validation:
<?xml version="1.0"?>
<markup>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:c="https://my_tag_lib.example.com/">
<c:section>
<div>
<c:my-invalid-element>This is a test</c:my-invalid-element>
</div>
</c:section>
</html>
</markup>
But then this fails validation:
<?xml version="1.0"?>
<markup>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:c="https://my_tag_lib.example.com/">
<c:section>
<div>
<c:paragraph my-invalid-attr="true">This is a test</c:paragraph>
</div>
</c:section>
</html>
</markup>
Why are the attributes being validated against the schema for recognized elements, while unrecognized elements are seemingly not getting sanitized at all? What's the logic here? I've been using xmllint
to do the validation:
xmllint --schema markup.xsd example.xml
Here are my XSD files:
File: markup.xsd
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xhtml="http://www.w3.org/1999/xhtml">
<xs:import namespace="http://www.w3.org/1999/xhtml" schemaLocation="html.xsd" />
<xs:element name="markup">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element ref="xhtml:html" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
File: html.xsd
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3.org/1999/xhtml">
<xs:import namespace="https://my_tag_lib.example.com/" schemaLocation="my_tag_lib.xsd" />
<xs:element name="html">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:any processContents="lax" namespace="http://www.w3.org/1999/xhtml" />
<xs:any processContents="strict" namespace="https://my_tag_lib.example.com/" />
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>
File: my_tag_lib.xsd
<?xml version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="https://my_tag_lib.example.com/">
<xs:element name="section">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:any processContents="lax" namespace="http://www.w3.org/1999/xhtml" />
<xs:any processContents="strict" namespace="https://my_tag_lib.example.com/" />
</xs:choice>
</xs:complexType>
</xs:element>
<xs:element name="paragraph">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:any processContents="lax" namespace="http://www.w3.org/1999/xhtml" />
<xs:any processContents="strict" namespace="https://my_tag_lib.example.com/" />
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>