The most performant way to validate XML against XSD
Asked Answered
C

4

8

I get a string variable with XML in it and have a XSD file. I have to validate the XML in the string against the XSD file and know there is more than one way (XmlDocument, XmlReader, ... ?).

After the validation I just have to store the XML, so I don't need it in an XDocument or XmlDocument.

What's the way to go if I want the fastest performance?

Cottage answered 8/9, 2010 at 10:59 Comment(0)
W
14

Others have already mentioned the XmlReader class for doing the validation, and I wont elaborate further into that.

Your question does not specify much context. Will you be doing this validation repeatedly for several xml documents, or just once? I'm reading a scenario where you are just validating a lot of xml documents (from a third party system?) and storing them for future use.

My contribution to the performance hunt would be to use a compiled XmlSchemaSet which would be thread safe, so several threads can reuse it without needing to parse the xsd document again.

var xmlSchema = XmlSchema.Read(stream, null);
var xmlSchemaSet = new XmlSchemaSet();
xmlSchemaSet.Add(xmlSchema);
xmlSchemaSet.Compile();

CachedSchemas.Add(name, xmlSchemaSet);
Wellestablished answered 8/9, 2010 at 12:48 Comment(6)
Yes, I validate and store a lot of xml document from a third party system for later use. The XSD is always the same, so your hint, compiling the schema set is much apprechiated, thanks!Cottage
What is CachedSchemas in this example?Varipapa
Just a IDictionary<String, XmlSchemaSet> for caching the results.Wellestablished
Why do you think XmlSchemaSet is thread safe? blogs.msdn.com/b/xmlteam/archive/2009/04/27/…Polychrome
@RichB, that example works just as I described. Initialize a XmlSchemaSet, compile it, and then use it from several threads. But no, there is no support for what I am saying in any documentation I can find.Wellestablished
You might want to look into XmlReaderSettings.IgnoreComments, IgnoreWhitespace and IgnoreProcessingInstructions; My tests only apply to XML files without comments but if yours contain heavy comments, it might help (to verify)Thermoelectrometer
P
3

I would go for the XmlReader with XmlReaderSettings because does not need to load the complete XML in memory. It will be more efficient for big XML files.

Pokey answered 8/9, 2010 at 11:9 Comment(0)
L
2

I think the fastest way is to use an XmlReader that validates the document as it is being read. This allows you to validate the document in only one pass: http://msdn.microsoft.com/en-us/library/hdf992b8.aspx

Locution answered 8/9, 2010 at 11:11 Comment(0)
B
0

Use an XmlReader configured to perform validation, with the source being a TextReader.

You can manually specify the XSD the XmlReader is to use if you don't want to rely on declarations in the input document (with XmlReaderSettings.Schemas property)

A start (just assumes XSD-instance declarations in the input document) would be:

var settings = new XmlReaderSettings {
   ConformanceLevel = ConformanceLevel.Document,
   ValidationType = ValidationType.Schema,
   ValidationFlags = XmlSchemaValidationFlags.ProcessSchemaLocation |
                     XmlSchemaValidationFlags.ProcessInlineSchema,
};

int warnings = 0;
int errors = 0;
settings.ValidationEventHandler += (obj, ea) => {
   if (args.Severity == XmlSeverityType.Warning) {
      ++warnings;
   } else {
      ++errors;
   }
};

XmlReader xvr = XmlReader.Create(new StringReader(inputDocInString), settings);

try {
   while (xvr.Read()) {
      // do nothing
   }

   if (0 != errors) {
      Console.WriteLine("\nFailed to load XML, {0} error(s) and {1} warning(s).", errors, warnings);
   } else if (0 != warnings) {
      Console.WriteLine("\nLoaded XML with {0} warning(s).", warnings);
   } else {
      System.Console.WriteLine("Loaded XML OK");
   }

   Console.WriteLine("\nSchemas loaded durring validation:");
   ListSchemas(xvr.Schemas, 1);

} catch (System.Xml.Schema.XmlSchemaException e) {
   System.Console.Error.WriteLine("Failed to read XML: {0}", e.Message);
} catch (System.Xml.XmlException e) {
   System.Console.Error.WriteLine("XML Error: {0}", e.Message);
} catch (System.IO.IOException e) {
   System.Console.Error.WriteLine("IO error: {0}", e.Message);
}
Briscoe answered 8/9, 2010 at 11:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.