XML (.xsd) feed validation against a schema
Asked Answered
C

3

19

I have a XML file and I have a XML schema. I want to validate the file against that schema and check if it adheres to that. I am using python but am open to any language for that matter if there is no such useful library in python.

What would be my best options here? I would worry about the how fast I can get this up and running.

Canonicity answered 23/7, 2013 at 19:58 Comment(0)
R
28

Definitely lxml.

Define an XMLParser with a predefined schema, load the the file fromstring() and catch any XML Schema errors:

from lxml import etree

def validate(xmlparser, xmlfilename):
    try:
        with open(xmlfilename, 'r') as f:
            etree.fromstring(f.read(), xmlparser) 
        return True
    except etree.XMLSchemaError:
        return False

schema_file = 'schema.xsd'
with open(schema_file, 'r') as f:
    schema_root = etree.XML(f.read())

schema = etree.XMLSchema(schema_root)
xmlparser = etree.XMLParser(schema=schema)

filenames = ['input1.xml', 'input2.xml', 'input3.xml']
for filename in filenames:
    if validate(xmlparser, filename):
        print("%s validates" % filename)
    else:
        print("%s doesn't validate" % filename)

Note about encoding

If the schema file contains an xml tag with an encoding (e.g. <?xml version="1.0" encoding="UTF-8"?>), the code above will generate the following error:

Traceback (most recent call last):
  File "<input>", line 2, in <module>
    schema_root = etree.XML(f.read())
  File "src/lxml/etree.pyx", line 3192, in lxml.etree.XML
  File "src/lxml/parser.pxi", line 1872, in lxml.etree._parseMemoryDocument
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.

A solution is to open the files in byte mode: open(..., 'rb')

[...]
def validate(xmlparser, xmlfilename):
    try:
        with open(xmlfilename, 'rb') as f:
[...]
with open(schema_file, 'rb') as f:
[...]
Reba answered 23/7, 2013 at 20:3 Comment(15)
It does work, yes. Is there a brief tutorial on it ? I passed the schema and feed file and it took both and processed them. How would I know if it got validated or not ?Canonicity
It's simple. etree.fromstring will throw an exception if the xml file doesn't validate.Reba
wow, that was quick. Now the thing is I would want to read multiple xml feeds and validate them against the schema. So I could just loop them up through fromstring ? 1. Would on an exception it just stop processing and ignore other feeds? I would want to process all the feed files and then if possible give an error as to where it failed or did not validate. 2. Also, it feed might have many record, is there any way to run all of them and divide them on the basis of passing or failing the validation.Canonicity
I've updated the code assuming the schema for all xmls is the same - though I think you've got the idea anyway. Please, check.Reba
Works like a charm. Will play with it more ! Thanks.Canonicity
Just one more general question : This just validates the structure or also the permissible values in fields? Also in case I get an error is there a way to get more personalized error , as to where exactly did it fail ?Canonicity
It should validates permissible values too. And, yes, lxml tells you there exactly is an error - just print the traceback.Reba
I have two files on my schema, one referencing the other. How should I proceed?Contiguous
Super!! It just worked as it is...and serve the complete purpose.. Thanks a lot @RebaBartram
You probably want to restrict the list of caught exceptions. This will return False if the file does not exist - which might be difficult to debug.Flora
@Flora updated, hope you can test it and confirm it is working as expected. Thanks.Reba
@Reba It works great for python 2.7. I'm trying to validate the same way in python 3.4. I'm not successful. Is there a way to achieve XSD validation in xml.etree.ElementTree package?Irrawaddy
@SatishJonnala consider making a separate question if you have difficulties with python3.4 specific solution. Throw me a link here. Thanks.Reba
@Reba #31273930Irrawaddy
Also, this may hang or take additional time if retrieving schemas from the internet. Consider using a catalog: blog.frankel.ch/use-local-resources-when-validating-xml xmlsoft.org/catalog.htmlCabe
S
4

The python snippet is good, but an alternative is to use xmllint:

xmllint -schema sample.xsd --noout sample.xml
Stromboli answered 10/1, 2017 at 15:1 Comment(2)
Just found this googling the same issue--I like this over installing another XML library (I'm using the built-in xml.etree module to generate the XML).Disturbing
It takes forever for me to download the schema from oasis, if it hangs or take an extra long time, consider using a catalog: blog.frankel.ch/use-local-resources-when-validating-xml xmlsoft.org/catalog.htmlCabe
G
0
import xmlschema


def get_validation_errors(xml_file, xsd_file):
    schema = xmlschema.XMLSchema(xsd_file)
    validation_error_iterator = schema.iter_errors(xml_file)
    errors = list()
    for idx, validation_error in enumerate(validation_error_iterator, start=1):
        err = validation_error.__str__()
        errors.append(err)
        print(err)
    return errors

Greywacke answered 23/6, 2021 at 7:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.