How can I get more information on an invalid DOM element through the Validator?
Asked Answered
C

2

6

I am validating an in-memory DOM object using the javax.xml.validation.Validator class against an XSD schema. I am getting a SAXParseException being thrown during the validation whenever there is some data corruption in the information I populate my DOM from.

An example error:

org.xml.SAXParseException: cvc-datatype-valid.1.2.1: '???"??[?????G?>???p~tn??~0?1]' is not a valid valud for 'hexBinary'.

What I am hoping is that there is a way to find the location of this error in my in-memory DOM and print out the offending element and its parent element. My current code is:

public void writeDocumentToFile(Document document) throws XMLWriteException {
  try {
    // Validate the document against the schema
    Validator validator = getSchema(xmlSchema).newValidator();
    validator.validate(new DOMSource(document));

    // Serialisation logic here.

  } catch(SAXException e) {
    throw new XMLWriteException(e); // This is being thrown
  } // Some other exceptions caught here.
}

private Schema getSchema(URL schema) throws SAXException {
  SchemaFactory schemaFactory = 
    SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

  // Some logic here to specify a ResourceResolver

  return schemaFactory.newSchema(schema);
}

I have looked into the Validator#setErrorHandler(ErrorHandler handler) method but the ErrorHandler interface only gives me exposure to a SAXParseException which only exposes the line number and column number of the error. Because I am using an in-memory DOM this returns -1 for both line and column number.

Is there a better way to do this? I don't really want to have to manually validate the Strings before I add them to the DOM if the libraries provide me the function I'm looking for.

I'm using JDK 6 update 26 and JDK 6 update 7 depending on where this code is running.

EDIT: With this code added -

validator.setErrorHandler(new ErrorHandler() {
  @Override
  public void warning(SAXParseException exception) throws SAXException {
    printException(exception);
    throw exception;
  }

  @Override
  public void error(SAXParseException exception) throws SAXException {
    printException(exception);
    throw exception;
  }

  @Override
  public void fatalError(SAXParseException exception) throws SAXException {
    printException(exception);
    throw exception;
  }

  private void printException(SAXParseException exception) {
    System.out.println("exception.getPublicId() = " + exception.getPublicId());
    System.out.println("exception.getSystemId() = " + exception.getSystemId());
    System.out.println("exception.getColumnNumber() = " + exception.getColumnNumber());
    System.out.println("exception.getLineNumber() = " + exception.getLineNumber());
  }
});

I get the output:

exception.getPublicId() = null
exception.getSystemId() = null
exception.getColumnNumber() = -1
exception.getLineNumber() = -1
Compliant answered 10/11, 2011 at 9:33 Comment(0)
L
5

If you are using Xerces (the Sun JDK default), you can get the element that failed validation through the http://apache.org/xml/properties/dom/current-element-node property:

...
catch (SAXParseException e)
{
    Element curElement = (Element)validator.getProperty("http://apache.org/xml/properties/dom/current-element-node");

    System.out.println("Validation error: " + e.getMessage());
    System.out.println("Element: " + curElement);
}   

Example:

String xml = "<root xmlns=\"http://www.myschema.org\">\n" +
             "<text>This is text</text>\n" +
             "<number>32</number>\n" +
             "<number>abc</number>\n" +
             "</root>";

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
Document doc = dbf.newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
Schema schema = getSchema(getClass().getResource("myschema.xsd"));

Validator validator = schema.newValidator();
try
{
    validator.validate(new DOMSource(doc));
}
catch (SAXParseException e)
{
    Element curElement = (Element)validator.getProperty("http://apache.org/xml/properties/dom/current-element-node");

    System.out.println("Validation error: " + e.getMessage());
    System.out.println(curElement.getLocalName() + ": " + curElement.getTextContent());

    //Use curElement.getParentNode() or whatever you need here
}         

If you need to get line/column numbers from the DOM, this answer has a solution to that problem.

Limacine answered 17/11, 2011 at 4:37 Comment(4)
Hrmm, thanks for the detailed answer. I am getting org.xml.sax.SAXNotRecognizedException: Property 'http://apache.org/xml/properties/dom/current-element-node' is not recognized. under JDK 1.6 update 7. I assume accessing this property doesn't require internet access?Compliant
@Bringer128 I'm running under Java 1.6.0u25 and it works. It may have been a Xerces property that was added recently. It might be possible if you have to run under an old JDK 1.6 to bundle a later version of Xerces with your app and use DocumentBuilderFactory/Validator implementations from that instead of the JDK default.Limacine
You're right, it works under Java 1.6.0u26 too. This appears to be what I'm looking for, thanks!Compliant
Is it possible to do this when a DTD file is being used to validate the XML?Remittee
E
0

SaxParseException exposes the SystemId and PublicId. Does that not give you enough information?

Exeter answered 16/11, 2011 at 16:24 Comment(1)
I've updated my question. The validator appears to be returning null for those two values, so they're not of much use.Compliant

© 2022 - 2024 — McMap. All rights reserved.