You could use the OMML2MML.XSL
file (located under %ProgramFiles%\Microsoft Office\Office15
)
to transform Microsoft Office MathML (equations) included in a word document into MathML.
The code below shows how to transform the equations in a word document into MathML
using the following steps:
- Open the word document using OpenXML SDK (version 2.5).
- Create a XslCompiledTransform and load the OMML2MML.XSL file.
- Transform the word document by calling the Transform() method
on the created XslCompiledTransform instance.
- Output the result of the transform (e.g. print on console or write to file).
I've tested the code below with a simple word document containing two equations, text and pictures.
using System.IO;
using System.Xml;
using System.Xml.Xsl;
using DocumentFormat.OpenXml.Packaging;
public string GetWordDocumentAsMathML(string docFilePath, string officeVersion = "14")
{
string officeML = string.Empty;
using (WordprocessingDocument doc = WordprocessingDocument.Open(docFilePath, false))
{
string wordDocXml = doc.MainDocumentPart.Document.OuterXml;
XslCompiledTransform xslTransform = new XslCompiledTransform();
// The OMML2MML.xsl file is located under
// %ProgramFiles%\Microsoft Office\Office15\
xslTransform.Load(@"c:\Program Files\Microsoft Office\Office" + officeVersion + @"\OMML2MML.XSL");
using (TextReader tr = new StringReader(wordDocXml))
{
// Load the xml of your main document part.
using (XmlReader reader = XmlReader.Create(tr))
{
using (MemoryStream ms = new MemoryStream())
{
XmlWriterSettings settings = xslTransform.OutputSettings.Clone();
// Configure xml writer to omit xml declaration.
settings.ConformanceLevel = ConformanceLevel.Fragment;
settings.OmitXmlDeclaration = true;
XmlWriter xw = XmlWriter.Create(ms, settings);
// Transform our OfficeMathML to MathML.
xslTransform.Transform(reader, xw);
ms.Seek(0, SeekOrigin.Begin);
using (StreamReader sr = new StreamReader(ms, Encoding.UTF8))
{
officeML = sr.ReadToEnd();
// Console.Out.WriteLine(officeML);
}
}
}
}
}
return officeML;
}
To convert only one single equation (and not the whole word document) just query for the desired Office Math Paragraph (m:oMathPara) and use the OuterXML
property of this node.
The code below shows how to query for the first math paragraph:
string mathParagraphXml =
doc.MainDocumentPart.Document.Descendants<DocumentFormat.OpenXml.Math.Paragraph>().First().OuterXml;
Use the returned XML to feed the TextReader
.