How to validate an XML file against an XSD file?
Asked Answered
I

13

287

I'm generating some xml files that needs to conform to an xsd file that was given to me. How should I verify they conform?

Ire answered 19/8, 2008 at 4:59 Comment(0)
R
356

The Java runtime library supports validation. Last time I checked this was the Apache Xerces parser under the covers. You should probably use a javax.xml.validation.Validator.

import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import java.net.URL;
import org.xml.sax.SAXException;
//import java.io.File; // if you use File
import java.io.IOException;
...
URL schemaFile = new URL("http://host:port/filename.xsd");
// webapp example xsd: 
// URL schemaFile = new URL("http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd");
// local file example:
// File schemaFile = new File("/location/to/localfile.xsd"); // etc.
Source xmlFile = new StreamSource(new File("web.xml"));
SchemaFactory schemaFactory = SchemaFactory
    .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
  Schema schema = schemaFactory.newSchema(schemaFile);
  Validator validator = schema.newValidator();
  validator.validate(xmlFile);
  System.out.println(xmlFile.getSystemId() + " is valid");
} catch (SAXException e) {
  System.out.println(xmlFile.getSystemId() + " is NOT valid reason:" + e);
} catch (IOException e) {}

The schema factory constant is the string http://www.w3.org/2001/XMLSchema which defines XSDs. The above code validates a WAR deployment descriptor against the URL http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd but you could just as easily validate against a local file.

You should not use the DOMParser to validate a document (unless your goal is to create a document object model anyway). This will start creating DOM objects as it parses the document - wasteful if you aren't going to use them.

Rurik answered 19/8, 2008 at 12:21 Comment(8)
Are you using a DOM or SAX parser in this example? How do i tell which parser you are using as i cant see a reference to either.Goddess
@Goddess - this is an implementation detail of the JAXP implementation. Sun's JDK 6 uses SAX parser with a StreamSource. A JAXP implementation could legally use a DOM parser in this case, but there is no reason to. If you use a DOM parser explicitly for validation, you will definitely instantiate a DOM tree.Rurik
How do i use an ErrorHandler with the above? Is is a case of just creating the ErrorHandler and associating it with the validator? i.e. validator.SetErrorHandler() as in the example in this SO question #4865181?Goddess
Shouldn't execptions just be used for execptional situations and not for control flow?Imagination
Won't this code only catch fatal errors? If you want to be able to catch non-fatals (such as non-structural ones) I think you will need to use an ErrorHandler.Nirvana
This code doesn't work when the file to validate contains a DOCTYPE declaration, if someone know why ?Minaminabe
If you are interested in how to validate against a set of local schemas, take a look at https://mcmap.net/q/109896/-offline-xml-validation-with-javaSeaward
It works fine, but in Sonar cube scan it will show "Disable XML external entity (XXE) processing", and it is blocker for a codeSail
S
25

Here's how to do it using Xerces2. A tutorial for this, here (req. signup).

Original attribution: blatantly copied from here:

import org.apache.xerces.parsers.DOMParser;
import java.io.File;
import org.w3c.dom.Document;

public class SchemaTest {
  public static void main (String args[]) {
      File docFile = new File("memory.xml");
      try {
        DOMParser parser = new DOMParser();
        parser.setFeature("http://xml.org/sax/features/validation", true);
        parser.setProperty(
             "http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation", 
             "memory.xsd");
        ErrorChecker errors = new ErrorChecker();
        parser.setErrorHandler(errors);
        parser.parse("memory.xml");
     } catch (Exception e) {
        System.out.print("Problem parsing the file.");
     }
  }
}
Sardonyx answered 19/8, 2008 at 5:10 Comment(3)
The SAX parser would be more efficient - the DOM parser creates DOM objects; wasteful operations in this instance.Rurik
The question is to validate an XML against a XSD. In this answer you are going further and getting a Parser object, which is not needed, right?Survivor
"ErrorChecker cannor be resolved to a type" .. missing import ?Siegfried
C
20

We build our project using ant, so we can use the schemavalidate task to check our config files:

<schemavalidate> 
    <fileset dir="${configdir}" includes="**/*.xml" />
</schemavalidate>

Now naughty config files will fail our build!

http://ant.apache.org/manual/Tasks/schemavalidate.html

Cattleya answered 14/7, 2011 at 8:1 Comment(0)
P
18

Since this is a popular question, I will point out that java can also validate against "referred to" xsd's, for instance if the .xml file itself specifies XSD's in the header, using xsi:schemaLocation or xsi:noNamespaceSchemaLocation (or xsi for particular namespaces) ex:

<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="http://www.example.com/document.xsd">
  ...

or schemaLocation (always a list of namespace to xsd mappings)

<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.example.com/my_namespace http://www.example.com/document.xsd">
  ...

The other answers work here as well, because the .xsd files "map" to the namespaces declared in the .xml file, because they declare a namespace, and if matches up with the namespace in the .xml file, you're good. But sometimes it's convenient to be able to have a custom resolver...

From the javadocs: "If you create a schema without specifying a URL, file, or source, then the Java language creates one that looks in the document being validated to find the schema it should use. For example:"

SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = factory.newSchema();

and this works for multiple namespaces, etc. The problem with this approach is that the xmlsns:xsi is probably a network location, so it'll by default go out and hit the network with each and every validation, not always optimal.

Here's an example that validates an XML file against any XSD's it references (even if it has to pull them from the network):

  public static void verifyValidatesInternalXsd(String filename) throws Exception {
    InputStream xmlStream = new new FileInputStream(filename);
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(true);
    factory.setNamespaceAware(true);
    factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
                 "http://www.w3.org/2001/XMLSchema");
    DocumentBuilder builder = factory.newDocumentBuilder();
    builder.setErrorHandler(new RaiseOnErrorHandler());
    builder.parse(new InputSource(xmlStream));
    xmlStream.close();
  }

  public static class RaiseOnErrorHandler implements ErrorHandler {
    public void warning(SAXParseException e) throws SAXException {
      throw new RuntimeException(e);
    }
    public void error(SAXParseException e) throws SAXException {
      throw new RuntimeException(e);
    }
    public void fatalError(SAXParseException e) throws SAXException {
      throw new RuntimeException(e);
    }
  }

You can avoid pulling referenced XSD's from the network, even though the xml files reference url's, by specifying the xsd manually (see some other answers here) or by using an "XML catalog" style resolver. Spring apparently also can intercept the URL requests to serve local files for validations. Or you can set your own via setResourceResolver, ex:

Source xmlFile = new StreamSource(xmlFileLocation);
SchemaFactory schemaFactory = SchemaFactory
                                .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema();
Validator validator = schema.newValidator();
validator.setResourceResolver(new LSResourceResolver() {
  @Override
  public LSInput resolveResource(String type, String namespaceURI,
                                 String publicId, String systemId, String baseURI) {
    InputSource is = new InputSource(
                           getClass().getResourceAsStream(
                          "some_local_file_in_the_jar.xsd"));
                          // or lookup by URI, etc...
    return new Input(is); // for class Input see 
                          // https://mcmap.net/q/109898/-how-to-validate-an-xml-file-using-java-with-an-xsd-having-an-include
  }
});
validator.validate(xmlFile);

See also here for another tutorial.

I believe the default is to use DOM parsing, you can do something similar with SAX parser that is validating as well saxReader.setEntityResolver(your_resolver_here);

Pastypat answered 19/12, 2016 at 14:57 Comment(3)
Doesn't work for me, method resolveResource() isn't called unless its set on schemaFactory, any idea?Accompanyist
Dunno, works for me. Make sure you're setting it via setResourceResolver but beyond that, maybe open new question...Pastypat
Resurrecting an old post, I think it should read xsi:schemaLocation instead of xsi:SchemaLocation - case matters. See w3.org/TR/xmlschema-1/#d0e3067Cuspidation
L
6

Using Java 7 you can follow the documentation provided in package description.

// create a SchemaFactory capable of understanding WXS schemas
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

// load a WXS schema, represented by a Schema instance
Source schemaFile = new StreamSource(new File("mySchema.xsd"));
Schema schema = factory.newSchema(schemaFile);

// create a Validator instance, which can be used to validate an instance document
Validator validator = schema.newValidator();

// validate the DOM tree
try {
    validator.validate(new StreamSource(new File("instance.xml"));
} catch (SAXException e) {
    // instance document is invalid!
}
Lucianolucias answered 13/5, 2013 at 9:40 Comment(6)
"Using Java 7.." That was actually included in Java 5.Amenra
This is basically the same as the accepted answer. This solution seems to me a bit inefficient though, as it unnecessarily builds the DOM for the xml to parse: parser.parse(new File("instance.xml")). The validator accepts a Source, so you can: validator.validate(new StreamSource(new File("instance.xml"))).Caplan
Working this way, a SAXException would be thrown at the first error in the xml-file and stops then the validation. But I want to know all (!) errors. If I use an ErrorHandler (own class that implements ErrorHandler) instead, it recognizes all errors, but the try-catch-block of validator.validate does not throw any Exception.. How do I recognize an error in the class that invokes the validate-method of my validator? Thanks for your help!Barite
There are "errors" (e.g. validation errors) and "fatal errors" (well-formedness errors). One fatal error typically stops the parsing. But a validation error does not stop it : you have to explicitly throw an exception. Thus, it is necessary to provide an ErrorHandler if you need to do validation.Boabdil
Gotta admit, the code looks cleaner and easier to read on this than the accepted answer.Morice
The validate line lacks a closing parenthesis.Rozina
H
3

One more answer: since you said you need to validate files you are generating (writing), you might want to validate content while you are writing, instead of first writing, then reading back for validation. You can probably do that with JDK API for Xml validation, if you use SAX-based writer: if so, just link in validator by calling 'Validator.validate(source, result)', where source comes from your writer, and result is where output needs to go.

Alternatively if you use Stax for writing content (or a library that uses or can use stax), Woodstox can also directly support validation when using XMLStreamWriter. Here's a blog entry showing how that is done:

Hypogeous answered 27/3, 2009 at 16:25 Comment(3)
Hey StaxMan, are there any XMLStreamWriters that do pretty-print indenting? I was surprised that it's not in the standard implementation. Also, is it getting much use? I think it's the right way to go, but there seems very little interest in it.Arcuation
just found your post here about StaxMate (but it's not an XMLStreamWriter): https://mcmap.net/q/109900/-stax-xml-formatting-in-java/…Arcuation
Yeah, StaxMate can do that. It uses XMLStreamWriter internally for writing content, so you can hook up validator that way too.Hypogeous
G
3

If you have a Linux-Machine you could use the free command-line tool SAXCount. I found this very usefull.

SAXCount -f -s -n my.xml

It validates against dtd and xsd. 5s for a 50MB file.

In debian squeeze it is located in the package "libxerces-c-samples".

The definition of the dtd and xsd has to be in the xml! You can't config them separately.

Gasaway answered 22/3, 2012 at 17:1 Comment(3)
This allows for simple XML validation from vim (:!SAXCount -f -n -s %)Fifteenth
or use the venerable xmllint xmllint --schema phone.xsd phone.xml (from an answer by 13ren)Pastypat
Nice answer for superuser.comRozina
C
3

With JAXB, you could use the code below:

    @Test
public void testCheckXmlIsValidAgainstSchema() {
    logger.info("Validating an XML file against the latest schema...");

    MyValidationEventCollector vec = new MyValidationEventCollector();

    validateXmlAgainstSchema(vec, inputXmlFileName, inputXmlSchemaName, inputXmlRootClass);

    assertThat(vec.getValidationErrors().isEmpty(), is(expectedValidationResult));
}

private void validateXmlAgainstSchema(final MyValidationEventCollector vec, final String xmlFileName, final String xsdSchemaName, final Class<?> rootClass) {
    try (InputStream xmlFileIs = Thread.currentThread().getContextClassLoader().getResourceAsStream(xmlFileName);) {
        final JAXBContext jContext = JAXBContext.newInstance(rootClass);
        // Unmarshal the data from InputStream
        final Unmarshaller unmarshaller = jContext.createUnmarshaller();

        final SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        final InputStream schemaAsStream = Thread.currentThread().getContextClassLoader().getResourceAsStream(xsdSchemaName);
        unmarshaller.setSchema(sf.newSchema(new StreamSource(schemaAsStream)));

        unmarshaller.setEventHandler(vec);

        unmarshaller.unmarshal(new StreamSource(xmlFileIs), rootClass).getValue(); // The Document class is the root object in the XML file you want to validate

        for (String validationError : vec.getValidationErrors()) {
            logger.trace(validationError);
        }
    } catch (final Exception e) {
        logger.error("The validation of the XML file " + xmlFileName + " failed: ", e);
    }
}

class MyValidationEventCollector implements ValidationEventHandler {
    private final List<String> validationErrors;

    public MyValidationEventCollector() {
        validationErrors = new ArrayList<>();
    }

    public List<String> getValidationErrors() {
        return Collections.unmodifiableList(validationErrors);
    }

    @Override
    public boolean handleEvent(final ValidationEvent event) {
        String pattern = "line {0}, column {1}, error message {2}";
        String errorMessage = MessageFormat.format(pattern, event.getLocator().getLineNumber(), event.getLocator().getColumnNumber(),
                event.getMessage());
        if (event.getSeverity() == ValidationEvent.FATAL_ERROR) {
            validationErrors.add(errorMessage);
        }
        return true; // you collect the validation errors in a List and handle them later
    }
}
Catrinacatriona answered 27/11, 2017 at 15:25 Comment(0)
B
2

If you are generating XML files programatically, you may want to look at the XMLBeans library. Using a command line tool, XMLBeans will automatically generate and package up a set of Java objects based on an XSD. You can then use these objects to build an XML document based on this schema.

It has built-in support for schema validation, and can convert Java objects to an XML document and vice-versa.

Castor and JAXB are other Java libraries that serve a similar purpose to XMLBeans.

Bryophyte answered 28/1, 2009 at 18:6 Comment(0)
J
1

Using Woodstox, configure the StAX parser to validate against your schema and parse the XML.

If exceptions are caught the XML is not valid, otherwise it is valid:

// create the XSD schema from your schema file
XMLValidationSchemaFactory schemaFactory = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA);
XMLValidationSchema validationSchema = schemaFactory.createSchema(schemaInputStream);

// create the XML reader for your XML file
WstxInputFactory inputFactory = new WstxInputFactory();
XMLStreamReader2 xmlReader = (XMLStreamReader2) inputFactory.createXMLStreamReader(xmlInputStream);

try {
    // configure the reader to validate against the schema
    xmlReader.validateAgainst(validationSchema);

    // parse the XML
    while (xmlReader.hasNext()) {
        xmlReader.next();
    }

    // no exceptions, the XML is valid

} catch (XMLStreamException e) {

    // exceptions, the XML is not valid

} finally {
    xmlReader.close();
}

Note: If you need to validate multiple files, you should try to reuse your XMLInputFactory and XMLValidationSchema in order to maximize the performance.

Jurgen answered 21/9, 2019 at 13:18 Comment(0)
C
0

Are you looking for a tool or a library?

As far as libraries goes, pretty much the de-facto standard is Xerces2 which has both C++ and Java versions.

Be fore warned though, it is a heavy weight solution. But then again, validating XML against XSD files is a rather heavy weight problem.

As for a tool to do this for you, XMLFox seems to be a decent freeware solution, but not having used it personally I can't say for sure.

Cassity answered 19/8, 2008 at 5:11 Comment(0)
S
0

Validate against online schemas

Source xmlFile = new StreamSource(Thread.currentThread().getContextClassLoader().getResourceAsStream("your.xml"));
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(Thread.currentThread().getContextClassLoader().getResource("your.xsd"));
Validator validator = schema.newValidator();
validator.validate(xmlFile);

Validate against local schemas

Offline XML Validation with Java

Seaward answered 4/10, 2018 at 11:36 Comment(0)
R
-3

I had to validate an XML against XSD just one time, so I tried XMLFox. I found it to be very confusing and weird. The help instructions didn't seem to match the interface.

I ended up using LiquidXML Studio 2008 (v6) which was much easier to use and more immediately familiar (the UI is very similar to Visual Basic 2008 Express, which I use frequently). The drawback: the validation capability is not in the free version, so I had to use the 30 day trial.

Retha answered 1/10, 2008 at 17:35 Comment(3)
The question is Java, but this answer is not. :-(Katherinkatherina
To be fair, the word "java" never appears in the question, just the tags. I'd ding the question for that, not the reply.Dale
Thanks james and Mark, help me sharpen up!Valaree

© 2022 - 2024 — McMap. All rights reserved.