OpenXML - Cell.DateType is null
Asked Answered
K

3

13

I can't determine when a Cell is a date.

Date DataType Cell

I noticed the DataType is null so I can't distinguish if it's a Number of a Date.

I am using the next code to extract the cells:

WorksheetPart worksheetPart = (WorksheetPart)workbookPart.GetPartById(worksheetId);
SheetData sheetData = worksheetPart.Worksheet.GetFirstChild<SheetData>();
Row[] rows = worksheetPart.Worksheet.Descendants<Row>().ToArray();
for (int i = 0; i < rows.Length; i++)
{
    List<Cell> cells = rows[i].Elements<Cell>().ToList();
    foreach (var cell in cells) 
    {
        if (cell.DataType != null && cell.DataType.Value == CellValues.Date)
            //this line is not hit for some reason
         }
    }
}

Am I missing something?

Kaitlynkaitlynn answered 16/4, 2016 at 23:0 Comment(0)
R
20

In short it is null because it is supposed to be for numeric and date types.

OpenXML documentation on msdn

The value of the DataType property is null for numeric and date types. It contains the value CellValues.SharedString for strings, and CellValues.Boolean for Boolean values.

There is a way though to distinguish between date and number cell formats using the NumberFormatId on the CellFormat. The trick is finding what id maps to what format. You can find out what format to use by creating new excel file and set a cell to the format in question (i.e. date):

enter image description here

Then extract excel file using 7zip and look inside xl/styles.xml file:

enter image description here

In the image above you can see this formatId 14 translates to short date. For a complete list of formats please refer to ECMA-376 documentation for Office Open XML formats (number format table should be buried somewhere inside part 4. They moved it to part 1 in section 18.8.30).

I created an enumeration for the most common formatIds:

private enum Formats
    {
        General = 0,
        Number = 1,
        Decimal = 2,
        Currency = 164,
        Accounting = 44,
        DateShort = 14,
        DateLong = 165,
        Time = 166,
        Percentage = 10,
        Fraction = 12,
        Scientific = 11,
        Text = 49
    }

You could then create a helper function that will get you the formatted value the way you would like:

private static string GetFormattedCellValue(WorkbookPart workbookPart, Cell cell)
    {
        if (cell == null)
        {
            return null;
        }

        string value = "";
        if (cell.DataType == null) // number & dates
        {
            int styleIndex = (int)cell.StyleIndex.Value;
            CellFormat cellFormat = (CellFormat)workbookPart.WorkbookStylesPart.Stylesheet.CellFormats.ElementAt(styleIndex);
            uint formatId = cellFormat.NumberFormatId.Value;
            
            if (formatId == (uint)Formats.DateShort || formatId == (uint)Formats.DateLong)
            {
                double oaDate;
                if (double.TryParse(cell.InnerText, out oaDate))
                {
                    value = DateTime.FromOADate(oaDate).ToShortDateString();
                }
            }
            else
            {
                value = cell.InnerText;
            }
        }
        else // Shared string or boolean
        {
            switch (cell.DataType.Value)
            {
                case CellValues.SharedString:
                    SharedStringItem ssi = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(int.Parse(cell.CellValue.InnerText));
                    value = ssi.Text.Text;
                    break;
                case CellValues.Boolean:
                    value = cell.CellValue.InnerText == "0" ? "false" : "true";
                    break;
                default:
                    value = cell.CellValue.InnerText;
                    break;
            }
        }

        return value;
    }
Rorry answered 27/3, 2017 at 15:13 Comment(2)
Do you know if different encoding types can be handled as well in open xml then how we can implement it?Truthful
regarding the styleIndex = (int)cell.StyleIndex.Value; part, StyleIndex can also be null, which causes an error when trying to access the Value property. I just check first to see if StyleIndex is null, and if so then leave formatId as 0Tamper
P
2

The previous answer states that the formats are documented in Part 4 but apparently they have been moved to Part 1. I found them by searching for #,##0.00. They are currently in 18.8.30 numFmt (Number Format) for the number formats whose formatCode value is implied rather than explicitly saved in the file. See 18.8.31 numFmts (Number Formats) for number formats in this workbook, consisting of a sequence of numFmt records, where each numFmt record defines a particular number format, indicating how to format and render the numeric value of a cell.

The following are the implied formatCode attribute for all languages.

ID formatCode
0 General
1 0
2 0.00
3 #,##0
4 #,##0.00
9 0%
10 0.00%
11 0.00E+00
12 # ?/?
13 # ??/??
14 mm-dd-yy
15 d-mmm-yy
16 d-mmm
17 mmm-yy
18 h:mm AM/PM
19 h:mm:ss AM/PM
20 h:mm
21 h:mm:ss
22 m/d/yy h:mm
37 #,##0 ;(#,##0)
38 #,##0 ;[Red](#,##0)
39 #,##0.00;(#,##0.00)
40 #,##0.00;[Red](#,##0.00)
45 mm:ss
46 [h]:mm:ss
47 mmss.0
48 ##0.0E+0
49 @

There are many other implied format codes that are language-dependent, such as:

28 [$-404]e"年"m"月"d"日" m"月"d"日"
Philan answered 28/1, 2020 at 20:40 Comment(0)
S
0

Have you tried cell.HasValue insteed? Because int and Datetime is not always nullable, it depends of how the code is written.

Suitcase answered 3/2, 2017 at 12:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.