How do I differentiate types of XML files before deserializing?
Asked Answered
F

3

8

I am loading MusicXML-files into my program. The problem: There are two “dialects”, timewise and partwise, which have different root-nodes (and a different structure):

<?xml version="1.0" encoding='UTF-8' standalone='no' ?>
<!DOCTYPE score-partwise PUBLIC "-//Recordare//DTD MusicXML 2.0 Partwise//EN" "http://www.musicxml.org/dtds/partwise.dtd">
<score-partwise version="2.0">
    <work>...</work>
    ...
</score-partwise>

and

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE score-timewise PUBLIC "-//Recordare//DTD MusicXML 2.0 Timewise//EN" "http://www.musicxml.org/dtds/timewise.dtd">
<score-timewise version="2.0">
   <work>...</work>
   ...
</score-timewise>

My code for deserializing the partwise score so far is:

using (var fileStream = new FileStream(openFileDialog.FileName, FileMode.Open))
{
    var xmlSerializer = new XmlSerializer(typeof(ScorePartwise));
    var result = (ScorePartwise)xmlSerializer.Deserialize(fileStream);
}

What would be the best way to differentiate between the two dialects?

Fame answered 14/5, 2014 at 19:19 Comment(4)
how big are the xml files?Mot
That really depends on the piece, an average motet by Palestrina with four voices has about 12000 lines / 300 KB. A whole symphony will definitely have more than that.Fame
Okay, I would load the 3rd line of the file into a string, and then do a String.IndexOf() to search for either partwise or timewise, then you know which type of file you are dealing with and can choose the correct serializer.Mot
I don't know of many (any) score-timewise files out in nature, so one way most systems do it is just to assume score-partwise. It may seem like a copout answer, but I think it's almost always what people do.Strachey
F
5

Here's a way to do it by using an XDocument to parse the file, read the root element to determine the type, and read it into your serializer.

var xdoc = XDocument.Load(filePath);
Type type;
if (xdoc.Root.Name.LocalName == "score-partwise")
    type = typeof(ScorePartwise);
else if (xdoc.Root.Name.LocalName == "score-timewise")
    type = typeof(ScoreTimewise);
else
    throw new Exception();
var xmlSerializer = new XmlSerializer(type);
var result = xmlSerializer.Deserialize(xdoc.CreateReader());
Fresher answered 14/5, 2014 at 19:34 Comment(2)
Loading the whole xml document just to check the first line will be somewhat slow considering the file is at minimum 12000 lines.Mot
You're about to read the whole file by deserializing it anyway. Reading it -> check first line -> send in-memory file to deserializer can't be too bad (assuming memory usage isn't too bad; I'm assuming the file is in the tens of MB or less, which should be fine).Fresher
S
3

I would create both serializers

var partwiseSerializer = new XmlSerializer(typeof(ScorePartwise));
var timewiseSerializer = new XmlSerializer(typeof(ScoreTimewise));

Assuming that there is only these two I would call CanDeserialize method on one

using (var fileStream = new FileStream(openFileDialog.FileName, FileMode.Open))
{
  using (var xmlReader = XmlReader.Create(filStream))
  {
    if (partwiseSerializer.CanDeserialize(xmlReader))
    {
       var result = partwiseSerializer.Deserialize(xmlReader);
    }
    else
    {
       var result = timewiseSerializer.Deserialize(xmlReader);
    }
  }
}

Obviously this is just an idea how to do it. If there were more options or according to your application design I would use a more sophisticated way to call CanDeserialize, but that method is the key in my opinion:

http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlserializer.candeserialize.aspx

The XmlReader class can be found here:

http://msdn.microsoft.com/en-us/library/System.Xml.XmlReader(v=vs.110).aspx

Spadework answered 14/5, 2014 at 19:34 Comment(0)
A
0

If you're concerned about resource usage:

    internal const string NodeStart = "<Error ";
    public static bool IsErrorDocument(string xml)
    {
        int headerLen = 1;
        if (xml.StartsWith(Constants.XMLHEADER_UTF8))
        {
            headerLen += Constants.XMLHEADER_UTF8.Length;
        }
        else if (xml.StartsWith(Constants.XMLHEADER_UTF16))
        {
            headerLen += Constants.XMLHEADER_UTF16.Length;
        }
        else
        {
            return false;
        }
        if (xml.Length < headerLen + NodeStart.Length)
        {
            return false;
        }
        return xml.Substring(headerLen, NodeStart.Length) == NodeStart;
    }

internal class Constants
{
    public const string XMLHEADER_UTF16 = "<?xml version=\"1.0\" encoding=\"utf-16\"?>";
    public const string XMLHEADER_UTF8 = "<?xml version=\"1.0\" encoding=\"utf-8\"?>";
}
Acoustician answered 8/11, 2019 at 17:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.