What indicates an Office Open XML Cell contains a Date/Time value?
Asked Answered
O

7

47

I'm reading an .xlsx file using the Office Open XML SDK and am confused about reading Date/Time values. One of my spreadsheets has this markup (generated by Excel 2010)

<x:row r="2" spans="1:22" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
  <x:c r="A2" t="s">
    <x:v>56</x:v>
  </x:c>
  <x:c r="B2" t="s">
    <x:v>64</x:v>
  </x:c>
  .
  .
  .
  <x:c r="J2" s="9">
    <x:v>17145</x:v>
  </x:c>

Cell J2 has a date serial value in it and a style attribute s="9". However, the Office Open XML Specification says that 9 corresponds to a followed hyperlink. This is a screen shot from page 4,999 of ECMA-376, Second Edition, Part 1 - Fundamentals And Markup Language Reference.pdf.

alt text

The presetCellStyles.xml file included with the spec also refers to builtinId 9 as a followed hyperlink.

<followedHyperlink builtinId="9">

All of the styles in the spec are simply visual formatting styles, not number styles. Where are the number styles defined and how does one differentiate a style reference s="9" from indicating a cell formatting (visual) style vs a number style?

Obviously I'm looking in the wrong place to match styles on cells with their number formats. Where's the right place to find this information?

Ose answered 18/1, 2011 at 23:14 Comment(0)
R
62

The s attribute references a style xf entry in styles.xml. The style xf in turn references a number format mask. To identify a cell that contains a date, you need to perform the style xf -> numberformat lookup, then identify whether that numberformat mask is a date/time numberformat mask (rather than, for example, a percentage or an accounting numberformat mask).

The style.xml file has elements like:

<xf numFmtId="14" ... applyNumberFormat="1" />
<xf numFmtId="1" ... applyNumberFormat="1" />

These are the xf entries, which in turn give you a numFmtId that references the number format mask.

You should find the numFmts section somewhere near the top of style.xml, as part of the styleSheet element

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
    <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
        <numFmts count="3">
            <numFmt numFmtId="164" formatCode="[$-414]mmmm\ yyyy;@" /> 
            <numFmt numFmtId="165" formatCode="0.000" /> 
            <numFmt numFmtId="166" formatCode="#,##0.000" /> 
        </numFmts>

The number format id may be here, or it may be one of the built-in formats. Number format codes (numFmtId) less than 164 are "built-in".

The list that I have is incomplete:

0 = 'General';
1 = '0';
2 = '0.00';
3 = '#,##0';
4 = '#,##0.00';

9 = '0%';
10 = '0.00%';
11 = '0.00E+00';
12 = '# ?/?';
13 = '# ??/??';
14 = 'mm-dd-yy';
15 = 'd-mmm-yy';
16 = 'd-mmm';
17 = 'mmm-yy';
18 = 'h:mm AM/PM';
19 = 'h:mm:ss AM/PM';
20 = 'h:mm';
21 = 'h:mm:ss';
22 = 'm/d/yy h:mm';

37 = '#,##0 ;(#,##0)';
38 = '#,##0 ;[Red](#,##0)';
39 = '#,##0.00;(#,##0.00)';
40 = '#,##0.00;[Red](#,##0.00)';

44 = '_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)';
45 = 'mm:ss';
46 = '[h]:mm:ss';
47 = 'mmss.0';
48 = '##0.0E+0';
49 = '@';

27 = '[$-404]e/m/d';
30 = 'm/d/yy';
36 = '[$-404]e/m/d';
50 = '[$-404]e/m/d';
57 = '[$-404]e/m/d';

59 = 't0';
60 = 't0.00';
61 = 't#,##0';
62 = 't#,##0.00';
67 = 't0%';
68 = 't0.00%';
69 = 't# ?/?';
70 = 't# ??/??';

The missing values are mainly related to east asian variant formats.

Raymond answered 18/1, 2011 at 23:31 Comment(9)
Thanks! Very detailed, exactly what I needed. Where did you get the incomplete built-in numFmtId list? Is the full list in the spec somewhere? Somewhere else?Ose
The full list of built-in number formats can be found in part 4 of the Ecma Office Open XML File Formats Standard documents (ecma-international.org/news/TC45_current_work/…) for OpenXML sections 3.8.30 and 3.8.31 (pages 2127 to 2143)Raymond
thanks again. I found the list in ECMA-376, Second Edition, Part 1 - Fundamentals And Markup Language Reference section 18.8.30 page 1964.Ose
Current version can be downloaded from: ecma-international.org/publications/standards/Ecma-376.htmRespective
See section 18.8.30 in current spec (part 1, page 1971).Respective
Still section 18.8.30 but page 1767, 4th edition, december 2012Kettledrum
Comment about NumFmtId="14", it's not exactly "mm-dd-yy" : this will be interpreted differently in Windows Excel 2010 software, based on the user's regional and language settings (at Windows level) : it will be the "Short date" format i.e. dd/mm/yyyy in French, or mm/dd/yyyy in English (verified by switching from English (United States) to French (France) and vice versa).Articulation
Microsoft link for codes: learn.microsoft.com/en-us/previous-versions/office/developer/…Treatment
Revised in section 18.8.30 page 1786 (5th Edition, part 1) the footer shows 1776.Bigmouth
R
8

Thought I'd add my solution that I've put together to determine if the double value FromOADate is really a date or not. Reason being is I have a zip code in my excel file as well. The numberingFormat will be null if it's text.

Alternatively you could use the numberingFormatId and check against a list of Ids that Excel uses for dates.

In my case I've explicitly determined the formatting of all fields for the client.

    /// <summary>
    /// Creates the datatable and parses the file into a datatable
    /// </summary>
    /// <param name="fileName">the file upload's filename</param>
    private void ReadAsDataTable(string fileName)
    {
        try
        {
            DataTable dt = new DataTable();
            using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(string.Format("{0}/{1}", UploadPath, fileName), false))
            {
                WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
                IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
                string relationshipId = sheets.First().Id.Value;
                WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
                Worksheet workSheet = worksheetPart.Worksheet;
                SheetData sheetData = workSheet.GetFirstChild<SheetData>();
                IEnumerable<Row> rows = sheetData.Descendants<Row>();

                var cellFormats = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats;
                var numberingFormats = workbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats;

                // columns omitted for brevity

                // skip first row as this row is column header names
                foreach (Row row in rows.Skip(1))
                {
                    DataRow dataRow = dt.NewRow();

                    for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
                    {
                        bool isDate = false;
                        var styleIndex = (int)row.Descendants<Cell>().ElementAt(i).StyleIndex.Value;
                        var cellFormat = (CellFormat)cellFormats.ElementAt(styleIndex);

                        if (cellFormat.NumberFormatId != null)
                        {
                            var numberFormatId = cellFormat.NumberFormatId.Value;
                            var numberingFormat = numberingFormats.Cast<NumberingFormat>()
                                .SingleOrDefault(f => f.NumberFormatId.Value == numberFormatId);

                            // Here's yer string! Example: $#,##0.00_);[Red]($#,##0.00)
                            if (numberingFormat != null && numberingFormat.FormatCode.Value.Contains("mm/dd/yy"))
                            {
                                string formatString = numberingFormat.FormatCode.Value;
                                isDate = true;
                            }
                        }

                        // replace '-' with empty string
                        string value = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i), isDate);
                        dataRow[i] = value.Equals("-") ? string.Empty : value;
                    }

                    dt.Rows.Add(dataRow);
                }
            }

            this.InsertMembers(dt);
            dt.Clear();
        }
        catch (Exception ex)
        {
            LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex);
        }
    }

    /// <summary>
    /// Reads the cell's value
    /// </summary>
    /// <param name="document">current document</param>
    /// <param name="cell">the cell to read</param>
    /// <returns>cell's value</returns>
    private string GetCellValue(SpreadsheetDocument document, Cell cell, bool isDate)
    {
        string value = string.Empty;

        try
        {
            SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
            value = cell.CellValue.InnerXml;

            if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
            {
                return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
            }
            else
            {
                // check if this is a date or zip.
                // integers will be passed into this else statement as well. 
                if (isDate)
                {
                    value = DateTime.FromOADate(double.Parse(value)).ToString();
                }

                return value;
            }
        }
        catch (Exception ex)
        {
            LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex);
        }

        return value;
    }
Rumery answered 12/2, 2015 at 18:32 Comment(1)
I get NumberFormatId = 14, and there is no item in the list with NumberingFormat.Id == 14Abessive
T
7

The chosen answer is spot-on, but note that Excel defines some number format (numFmt) codes differently from the OpenXML spec. Per the Open XML SDK 2.5 Productivity Tool's documentation (on the "Implementer Notes" tab for the NumberingFormat class):

The standard defines built-in format ID 14: "mm-dd-yy"; 22: "m/d/yy h:mm"; 37: "#,##0 ;(#,##0)"; 38: "#,##0 ;[Red]"; 39: "#,##0.00;(#,##0.00)"; 40: "#,##0.00;[Red]"; 47: "mmss.0"; KOR fmt 55: "yyyy-mm-dd".

Excel defines built-in format ID
14: "m/d/yyyy"
22: "m/d/yyyy h:mm"
37: "#,##0_);(#,##0)"
38: "#,##0_);[Red]"
39: "#,##0.00_);(#,##0.00)"
40: "#,##0.00_);[Red]"
47: "mm:ss.0"
55: "yyyy/mm/dd"

Most are minor variations, but #14 is a doozy. I wasted a couple of hours troubleshooting why leading zeros weren't being added to single-digits months and days (e.g. 01/05/14 vs. 1/5/14).

Tahoe answered 22/4, 2014 at 7:55 Comment(0)
S
1

In styles.xml see if there is a numFmt node. I think that will hold a numFmtId of "9" which will relate to the date format that's used.

I don't know where that is in the ECMA, but if you search for numFmt, you might find it.

Sole answered 18/1, 2011 at 23:30 Comment(1)
s="9" refers to an xfId, not a numFmtIdRaymond
J
0

It was unclear to me how to reliably determine whether a cell has date/time value. After spending some time experimenting I had come up with the code (see post) that would look for both built-in and custom date/time formats.

Jat answered 25/10, 2013 at 6:56 Comment(0)
B
0

In case anyone else is having a hard time with this, here is what I've done:

1) Create a new excel file and put in a date time string in cell A1

2) Change formatting on the cell to whatever you want, then save file.

3) Run following powershell script to extract out the stylesheet from .xlxs

[Reflection.Assembly]::LoadWithPartialName("DocumentFormat.OpenXml")

$xlsx = (ls C:\PATH\TO\FILE.xlsx).FullName
$package = [DocumentFormat.OpenXml.Packaging.SpreadsheetDocument]::Open($xlsx, $true)

[xml]$style = $package.WorkbookPart.WorkbookStylesPart.Stylesheet.OuterXml
Out-File -InputObject $style.OuterXml -FilePath "style.xml"

style.xml now contains the information that you can inject to DocumentFormat.OpenXml.Spreadsheet.Stylesheet(string outerXml), leading to

4) Use the extracted file to construct excel object model

var style = File.ReadAllText(@"c:\PATH\TO\EXTRACTED\Style.xml");
var stylesheetPart = WorkbookPart_REFERENCE.AddNewPart<WorkbookStylesPart>();
stylesheetPart.Stylesheet = new Stylesheet(style);
stylesheetPart.Stylesheet.Save();
But answered 20/4, 2015 at 11:48 Comment(0)
A
0

@RobScott reference to your code snippet I have found always null in style index of a particular Cell

              var styleIndex = (int)row.Descendants<Cell>().ElementAt(i).StyleIndex.Value;

my requirement to read below mentioned excel and transfrom the row and column data to the json.

excel reference

StockInvoiceNo StockInvoiceOn Name Description
DC3320012989 23-01-2021 00:00:00:00 item1 description
DC3320012989 24-01-2021 00:00:00:00 item2 description
DC3320012989 25-01-2021 00:00:00:00 item3 description
Aldora answered 20/1, 2021 at 7:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.