Parsing Mac XML PList into something readable
Asked Answered
B

1

9

I am trying to pull data from XML PList (Apple System Profiler) files, and read it into a memory database, and finally I want to turn it into something human-readable.

The problem is that the format seems to be very difficult to read in a consistent manner. I have gone over a few solutions already, but I haven't found a solution yet that I found satisfying. I always end up having to hard code a lot of values, and end up having to many if-else/switch statements.

The format looks like this.

<plist>
    <key>_system</key>
    <array>
     <dict>
      <key>_cpu_type</key>
      <string>Intel Core Duo</string>
     </dict>
    </array>
</plist>

Example file here.

After I have read (or during reading), I make use of an internal dictionary that I use to determine what type of information it is. For example, if the key is cpu_type I save the information accordingly.


A few examples that I have tried (simplified) to pull the information.

 XmlTextReader reader = new
 XmlTextReader("C:\\test.spx");

 reader.XmlResolver = null;

 reader.ReadStartElement("plist");

 String key = String.Empty; String str
 = String.Empty;

 Int32 Index = 0;

 while (reader.Read()) {

     if (reader.LocalName == "key")
     {
         Index++;
         key = reader.ReadString();
     }
     else if (reader.LocalName == "string")
     {
         str = reader.ReadString();

         if (key != String.Empty)
         {
             dct.Add(Index, new KeyPair(key, str));
             key = String.Empty;
         }
     } 
}

Or something like this.

foreach (var d in xdoc.Root.Elements("plist"))
   dict.Add(d.Element("key").Value,> d.Element("string").Value);

I found a framework, that I may be able to modify here.


Some more useful information

Mac OS X Information on the system profiler here.

Apple script used to parse the XML-files here.


Any advise or insight into this would be deeply appreciated.

Babin answered 1/7, 2011 at 0:0 Comment(3)
What do you mean by "something readable"? What are you trying to accomplish? Just stuffing everything into a flat dictionary won't be of much use because the keys aren't unique within the file; their position within the tree actually conveys information, too.Recently
Is it a key/value list from the level array/dict/array/dict that you are looking for? Or do you want every combination. It seems to me that there are several key/value patterns in the xml and that some of the have a further key/value to represent their value. Could you please provide information as to what exactly you are looking for in the XMLPerson
I am handling this by saving data with information on the section it was parsed from. For example cpu_type would be found in the dataType/section SPHardwareDataType. I am trying to make the file readable on Windows PCs, so the end-goal is to have all available information converted to something that can be read by the user.Babin
S
9

my first thought for this is just to use XSLT (XSL transformations). i don't know exactly what format you are looking for based on your answer in the above comments, but i think i got the gist at least. unless there's something special you need that i didn't think of, i believe XSLT is powerful enough to do everything you need, and no need for a bunch of complicated looping constructs.

if you're not familiar, there's a lot of good information on XSLT on w3schools (probably start at the intro: http://www.w3schools.com/xsl/xsl_intro.asp) and also wikipedia has a decent writeup on it (http://en.wikipedia.org/wiki/XSLT).

it always takes me a while to get the rules working the way i want; it's a different way of thinking about this kind of transform and took me some getting used to. it's necessary to have a decent understanding of XPATH as well. i am constantly having to refer to both the XSLT spec (http://www.w3.org/TR/xslt) and the XPATH spec (http://www.w3.org/TR/xpath/) since i've only had a small amount of experience with it, probably once you have worked with it a while it goes more smoothly.

anyway, i have an app i wrote previously for playing around with these translations. it's a C# application with three textboxes: one for the XSLT, one for the source, and one for the output. i spent a few (okay, many) hours trying to get a first cut of an XSLT that would process your sample data, to get an idea of how hard it would be and what the structure of the transform would be. i think i finally figured out pretty much what was needed, but since i don't know exactly what format you need, i stopped there.

here's a link to the sample transformed output: http://pastebin.com/SMFxUdDK.

following is all the code to actually do the transform, included in a form that you can use to develop as you go. it's not fancy but it has worked well for me. the "heavy lifting" is all done in the "btnTransform_Click()" handler, plus i have implemented an XmlStringWriter to make it easy to output things the way i want. the main bit of the work here is just in coming up with the XSLT directives, the actual transform is fairly well handled for you in the .NET XslCompiledTransform class. however, i figured i had spent enough time figuring out all the little details on it when i wrote it that it was worth giving a working example...

be aware i changed a couple of occurrences of a namespace here on-the-fly, and also added some light comments to the XSLT, so if there are issues let me know and i will correct them.

so, with no further adieu: ;)

the XSLT file:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
                        version="1.0"
                        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                        xmlns:msxsl="urn:schemas-microsoft-com:xslt"
                        exclude-result-prefixes="msxsl"
                        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                        xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                        xmlns:fn="http://www.w3.org/2005/xpath-functions"
>
  <!-- this just says to output XML as opposed to HTML or raw text -->
  <xsl:output method="xml" indent="yes" xsi:type="xsl:output" />

  <!-- this matches the root element and then creates a root element -->
  <!-- with more templates applied as children -->
  <xsl:template match="/" priority="9" >
    <xsl:element name="root" xmlns="http://www.tempuri.org/plist">
      <xsl:apply-templates/>
    </xsl:element>
  </xsl:template>

  <!-- wasn't sure how you would want the dict and arrays handled -->
  <!-- for a final cut, so i just make them into parent nodes of -->
  <!-- the data underneath them, and then apply the templates -->

  <xsl:template match="dict" priority="3" >
    <xsl:element name="dictionary" xmlns="http://www.tempuri.org/plist">
      <xsl:apply-templates/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="array" priority="5" >
    <xsl:element name="list" xmlns="http://www.tempuri.org/plist">
      <xsl:apply-templates/>
    </xsl:element>
  </xsl:template>

  <!-- actually, figuring the following step out is what hung me up; the -->
  <!-- issue here is that i'm taking the text out of the string/integer/date -->
  <!-- nodes and putting them into elements named after the 'key' nodes -->

  <!-- because of this, you actually have to have the template match the -->
  <!-- nodes you will be consuming and then just using the conditional -->
  <!-- to only process the 'key' nodes.  also, there were a couple of -->
  <!-- stray characters in the source XML; i think it was an encoding -->
  <!-- issue, so i just stripped them out with the "translate" call when -->
  <!-- creating the keyName variable. since those were the only two -->
  <!-- and because they looked to be strays, i did not worry about it -->
  <!-- further.  the only reason it is an issue is because i was -->
  <!-- creating elements out of the contents of the keys, and key names -->
  <!-- are restricted in what characters they can use. -->

  <xsl:template match="key|string|integer|date" priority="1" >
    <xsl:if test="local-name(self::node())='key'">
      <xsl:variable name="keyName" select="translate(child::text(),' €™','---')" />
      <xsl:element name="{$keyName}" xmlns="http://www.tempuri.org/plist" >
        <!-- removed on-the-fly; i had put this in while testing
          <xsl:if test="local-name(following-sibling::node())='string'">
        -->
          <xsl:value-of select="following-sibling::node()" />
        <!--
          </xsl:if>
        -->
      </xsl:element>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

a little helper class i made (XmlStringWriter.cs) :

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using System.Xml;

namespace XSLTTest.Xml
{
    public class XmlStringWriter :
        XmlWriter
    {
        public static XmlStringWriter Create(XmlWriterSettings Settings)
        {
            return new XmlStringWriter(Settings);
        }

        public static XmlStringWriter Create()
        {
            return XmlStringWriter.Create(XmlStringWriter.XmlWriterSettings_display);
        }

        public static XmlWriterSettings XmlWriterSettings_display
        {
            get
            {
                XmlWriterSettings XWS = new XmlWriterSettings();
                XWS.OmitXmlDeclaration = false; // make a choice?
                XWS.NewLineHandling = NewLineHandling.Replace;
                XWS.NewLineOnAttributes = false;
                XWS.Indent = true;
                XWS.IndentChars = "\t";
                XWS.NewLineChars = Environment.NewLine;
                //XWS.ConformanceLevel = ConformanceLevel.Fragment;
                XWS.CloseOutput = false;

                return XWS;
            }
        }

        public override string ToString()
        {
            return myXMLStringBuilder.ToString();
        }

        //public static implicit operator XmlWriter(XmlStringWriter Me)
        //{
        //   return Me.myXMLWriter;
        //}

        //--------------

        protected StringBuilder myXMLStringBuilder = null;
        protected XmlWriter myXMLWriter = null;

        protected XmlStringWriter(XmlWriterSettings Settings)
        {
            myXMLStringBuilder = new StringBuilder();
            myXMLWriter = XmlWriter.Create(myXMLStringBuilder, Settings);
        }

        public override void Close()
        {
            myXMLWriter.Close();
        }

        public override void Flush()
        {
            myXMLWriter.Flush();
        }

        public override string LookupPrefix(string ns)
        {
            return myXMLWriter.LookupPrefix(ns);
        }

        public override void WriteBase64(byte[] buffer, int index, int count)
        {
            myXMLWriter.WriteBase64(buffer, index, count);
        }

        public override void WriteCData(string text)
        {
            myXMLWriter.WriteCData(text);
        }

        public override void WriteCharEntity(char ch)
        {
            myXMLWriter.WriteCharEntity(ch);
        }

        public override void WriteChars(char[] buffer, int index, int count)
        {
            myXMLWriter.WriteChars(buffer, index, count);
        }

        public override void WriteComment(string text)
        {
            myXMLWriter.WriteComment(text);
        }

        public override void WriteDocType(string name, string pubid, string sysid, string subset)
        {
            myXMLWriter.WriteDocType(name, pubid, sysid, subset);
        }

        public override void WriteEndAttribute()
        {
            myXMLWriter.WriteEndAttribute();
        }

        public override void WriteEndDocument()
        {
            myXMLWriter.WriteEndDocument();
        }

        public override void WriteEndElement()
        {
            myXMLWriter.WriteEndElement();
        }

        public override void WriteEntityRef(string name)
        {
            myXMLWriter.WriteEntityRef(name);
        }

        public override void WriteFullEndElement()
        {
            myXMLWriter.WriteFullEndElement();
        }

        public override void WriteProcessingInstruction(string name, string text)
        {
            myXMLWriter.WriteProcessingInstruction(name, text);
        }

        public override void WriteRaw(string data)
        {
            myXMLWriter.WriteRaw(data);
        }

        public override void WriteRaw(char[] buffer, int index, int count)
        {
            myXMLWriter.WriteRaw(buffer, index, count);
        }

        public override void WriteStartAttribute(string prefix, string localName, string ns)
        {
            myXMLWriter.WriteStartAttribute(prefix, localName, ns);
        }

        public override void WriteStartDocument(bool standalone)
        {
            myXMLWriter.WriteStartDocument(standalone);
        }

        public override void WriteStartDocument()
        {
            myXMLWriter.WriteStartDocument();
        }

        public override void WriteStartElement(string prefix, string localName, string ns)
        {
            myXMLWriter.WriteStartElement(prefix, localName, ns);
        }

        public override WriteState WriteState
        {
            get 
            {
                return myXMLWriter.WriteState;
            }
        }

        public override void WriteString(string text)
        {
            myXMLWriter.WriteString(text);
        }

        public override void WriteSurrogateCharEntity(char lowChar, char highChar)
        {
            myXMLWriter.WriteSurrogateCharEntity(lowChar, highChar);
        }

        public override void WriteWhitespace(string ws)
        {
            myXMLWriter.WriteWhitespace(ws);
        }
    }
}

the windows forms designer class (frmXSLTTest.Designer.cs)

namespace XSLTTest
{
    partial class frmXSLTTest
    {
        /// <summary>
        /// Required designer variable.
        /// </summary>
        private System.ComponentModel.IContainer components = null;

        /// <summary>
        /// Clean up any resources being used.
        /// </summary>
        /// <param name="disposing">true if managed resources should be disposed; otherwise, false.</param>
        protected override void Dispose(bool disposing)
        {
            if (disposing && (components != null))
            {
                components.Dispose();
            }
            base.Dispose(disposing);
        }

        #region Windows Form Designer generated code

        /// <summary>
        /// Required method for Designer support - do not modify
        /// the contents of this method with the code editor.
        /// </summary>
        private void InitializeComponent()
        {
            this.splitContainer1 = new System.Windows.Forms.SplitContainer();
            this.btnTransform = new System.Windows.Forms.Button();
            this.groupBox1 = new System.Windows.Forms.GroupBox();
            this.txtStylesheet = new System.Windows.Forms.TextBox();
            this.splitContainer2 = new System.Windows.Forms.SplitContainer();
            this.groupBox2 = new System.Windows.Forms.GroupBox();
            this.txtInputXML = new System.Windows.Forms.TextBox();
            this.groupBox3 = new System.Windows.Forms.GroupBox();
            this.txtOutputXML = new System.Windows.Forms.TextBox();
            ((System.ComponentModel.ISupportInitialize)(this.splitContainer1)).BeginInit();
            this.splitContainer1.Panel1.SuspendLayout();
            this.splitContainer1.Panel2.SuspendLayout();
            this.splitContainer1.SuspendLayout();
            this.groupBox1.SuspendLayout();
            ((System.ComponentModel.ISupportInitialize)(this.splitContainer2)).BeginInit();
            this.splitContainer2.Panel1.SuspendLayout();
            this.splitContainer2.Panel2.SuspendLayout();
            this.splitContainer2.SuspendLayout();
            this.groupBox2.SuspendLayout();
            this.groupBox3.SuspendLayout();
            this.SuspendLayout();
            // 
            // splitContainer1
            // 
            this.splitContainer1.Dock = System.Windows.Forms.DockStyle.Fill;
            this.splitContainer1.Location = new System.Drawing.Point(0, 0);
            this.splitContainer1.Name = "splitContainer1";
            this.splitContainer1.Orientation = System.Windows.Forms.Orientation.Horizontal;
            // 
            // splitContainer1.Panel1
            // 
            this.splitContainer1.Panel1.Controls.Add(this.btnTransform);
            this.splitContainer1.Panel1.Controls.Add(this.groupBox1);
            // 
            // splitContainer1.Panel2
            // 
            this.splitContainer1.Panel2.Controls.Add(this.splitContainer2);
            this.splitContainer1.Size = new System.Drawing.Size(788, 363);
            this.splitContainer1.SplitterDistance = 194;
            this.splitContainer1.TabIndex = 0;
            // 
            // btnTransform
            // 
            this.btnTransform.Anchor = ((System.Windows.Forms.AnchorStyles)((System.Windows.Forms.AnchorStyles.Bottom | System.Windows.Forms.AnchorStyles.Left)));
            this.btnTransform.Location = new System.Drawing.Point(6, 167);
            this.btnTransform.Name = "btnTransform";
            this.btnTransform.Size = new System.Drawing.Size(75, 23);
            this.btnTransform.TabIndex = 1;
            this.btnTransform.Text = "Transform";
            this.btnTransform.UseVisualStyleBackColor = true;
            this.btnTransform.Click += new System.EventHandler(this.btnTransform_Click);
            // 
            // groupBox1
            // 
            this.groupBox1.Anchor = ((System.Windows.Forms.AnchorStyles)((((System.Windows.Forms.AnchorStyles.Top | System.Windows.Forms.AnchorStyles.Bottom) 
            | System.Windows.Forms.AnchorStyles.Left) 
            | System.Windows.Forms.AnchorStyles.Right)));
            this.groupBox1.Controls.Add(this.txtStylesheet);
            this.groupBox1.Location = new System.Drawing.Point(3, 3);
            this.groupBox1.Name = "groupBox1";
            this.groupBox1.Size = new System.Drawing.Size(782, 161);
            this.groupBox1.TabIndex = 0;
            this.groupBox1.TabStop = false;
            this.groupBox1.Text = "Stylesheet";
            // 
            // txtStylesheet
            // 
            this.txtStylesheet.Dock = System.Windows.Forms.DockStyle.Fill;
            this.txtStylesheet.Font = new System.Drawing.Font("Lucida Console", 7F, System.Drawing.FontStyle.Regular, System.Drawing.GraphicsUnit.Point, ((byte)(0)));
            this.txtStylesheet.Location = new System.Drawing.Point(3, 16);
            this.txtStylesheet.MaxLength = 1000000;
            this.txtStylesheet.Multiline = true;
            this.txtStylesheet.Name = "txtStylesheet";
            this.txtStylesheet.ScrollBars = System.Windows.Forms.ScrollBars.Both;
            this.txtStylesheet.Size = new System.Drawing.Size(776, 142);
            this.txtStylesheet.TabIndex = 0;
            // 
            // splitContainer2
            // 
            this.splitContainer2.Dock = System.Windows.Forms.DockStyle.Fill;
            this.splitContainer2.Location = new System.Drawing.Point(0, 0);
            this.splitContainer2.Name = "splitContainer2";
            // 
            // splitContainer2.Panel1
            // 
            this.splitContainer2.Panel1.Controls.Add(this.groupBox2);
            // 
            // splitContainer2.Panel2
            // 
            this.splitContainer2.Panel2.Controls.Add(this.groupBox3);
            this.splitContainer2.Size = new System.Drawing.Size(788, 165);
            this.splitContainer2.SplitterDistance = 395;
            this.splitContainer2.TabIndex = 0;
            // 
            // groupBox2
            // 
            this.groupBox2.Controls.Add(this.txtInputXML);
            this.groupBox2.Dock = System.Windows.Forms.DockStyle.Fill;
            this.groupBox2.Location = new System.Drawing.Point(0, 0);
            this.groupBox2.Name = "groupBox2";
            this.groupBox2.Size = new System.Drawing.Size(395, 165);
            this.groupBox2.TabIndex = 1;
            this.groupBox2.TabStop = false;
            this.groupBox2.Text = "Input XML";
            // 
            // txtInputXML
            // 
            this.txtInputXML.Dock = System.Windows.Forms.DockStyle.Fill;
            this.txtInputXML.Font = new System.Drawing.Font("Lucida Console", 7F, System.Drawing.FontStyle.Regular, System.Drawing.GraphicsUnit.Point, ((byte)(0)));
            this.txtInputXML.Location = new System.Drawing.Point(3, 16);
            this.txtInputXML.MaxLength = 1000000;
            this.txtInputXML.Multiline = true;
            this.txtInputXML.Name = "txtInputXML";
            this.txtInputXML.ScrollBars = System.Windows.Forms.ScrollBars.Both;
            this.txtInputXML.Size = new System.Drawing.Size(389, 146);
            this.txtInputXML.TabIndex = 1;
            // 
            // groupBox3
            // 
            this.groupBox3.Controls.Add(this.txtOutputXML);
            this.groupBox3.Dock = System.Windows.Forms.DockStyle.Fill;
            this.groupBox3.Location = new System.Drawing.Point(0, 0);
            this.groupBox3.Name = "groupBox3";
            this.groupBox3.Size = new System.Drawing.Size(389, 165);
            this.groupBox3.TabIndex = 1;
            this.groupBox3.TabStop = false;
            this.groupBox3.Text = "Output XML";
            // 
            // txtOutputXML
            // 
            this.txtOutputXML.Dock = System.Windows.Forms.DockStyle.Fill;
            this.txtOutputXML.Font = new System.Drawing.Font("Lucida Console", 7F, System.Drawing.FontStyle.Regular, System.Drawing.GraphicsUnit.Point, ((byte)(0)));
            this.txtOutputXML.Location = new System.Drawing.Point(3, 16);
            this.txtOutputXML.MaxLength = 1000000;
            this.txtOutputXML.Multiline = true;
            this.txtOutputXML.Name = "txtOutputXML";
            this.txtOutputXML.ScrollBars = System.Windows.Forms.ScrollBars.Both;
            this.txtOutputXML.Size = new System.Drawing.Size(383, 146);
            this.txtOutputXML.TabIndex = 1;
            // 
            // frmXSLTTest
            // 
            this.AutoScaleDimensions = new System.Drawing.SizeF(6F, 13F);
            this.AutoScaleMode = System.Windows.Forms.AutoScaleMode.Font;
            this.ClientSize = new System.Drawing.Size(788, 363);
            this.Controls.Add(this.splitContainer1);
            this.Name = "frmXSLTTest";
            this.Text = "frmXSLTTest";
            this.splitContainer1.Panel1.ResumeLayout(false);
            this.splitContainer1.Panel2.ResumeLayout(false);
            ((System.ComponentModel.ISupportInitialize)(this.splitContainer1)).EndInit();
            this.splitContainer1.ResumeLayout(false);
            this.groupBox1.ResumeLayout(false);
            this.groupBox1.PerformLayout();
            this.splitContainer2.Panel1.ResumeLayout(false);
            this.splitContainer2.Panel2.ResumeLayout(false);
            ((System.ComponentModel.ISupportInitialize)(this.splitContainer2)).EndInit();
            this.splitContainer2.ResumeLayout(false);
            this.groupBox2.ResumeLayout(false);
            this.groupBox2.PerformLayout();
            this.groupBox3.ResumeLayout(false);
            this.groupBox3.PerformLayout();
            this.ResumeLayout(false);

        }

        #endregion

        private System.Windows.Forms.SplitContainer splitContainer1;
        private System.Windows.Forms.Button btnTransform;
        private System.Windows.Forms.GroupBox groupBox1;
        private System.Windows.Forms.TextBox txtStylesheet;
        private System.Windows.Forms.SplitContainer splitContainer2;
        private System.Windows.Forms.GroupBox groupBox2;
        private System.Windows.Forms.GroupBox groupBox3;
        private System.Windows.Forms.TextBox txtInputXML;
        private System.Windows.Forms.TextBox txtOutputXML;
    }
}

the form class (frmXSLTTest.cs):

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;

using System.Xml;
using System.Xml.Xsl;

using XSLTTest.Xml;

namespace XSLTTest
{
    public partial class frmXSLTTest : Form
    {
        public frmXSLTTest()
        {
            InitializeComponent();
        }

        private void btnTransform_Click(object sender, EventArgs e)
        {
            try
            {
                // temporary to copy from clipboard when pressing 
                // the button instead of using the text in the textbox
                //txtStylesheet.Text = Clipboard.GetText();

                XmlDocument Stylesheet = new XmlDocument();
                Stylesheet.InnerXml = txtStylesheet.Text;

                XslCompiledTransform XCT = new XslCompiledTransform(true);
                XCT.Load(Stylesheet);

                XmlDocument InputDocument = new XmlDocument();
                InputDocument.InnerXml = txtInputXML.Text;

                XmlStringWriter OutputWriter = XmlStringWriter.Create();

                XCT.Transform(InputDocument, OutputWriter);

                txtOutputXML.Text = OutputWriter.ToString();
            }

            catch (Exception Ex)
            {
                txtOutputXML.Text = Ex.Message;
            }
        }
    }
}
Scrutiny answered 9/7, 2011 at 11:42 Comment(2)
oh, in case it wasn't clear from the answer it's possible to output raw text or HTML if you don't want an XML file; also, there are a few extra namespaces in the XSLT stylesheet definition that weren't used i think, but i'm not sure which ones at the moment so i'm leaving them.Scrutiny
This solution looks much cleaner and thanks for the comprehensive introduction. :)Babin

© 2022 - 2024 — McMap. All rights reserved.