Reading a date from xlsx using open xml sdk
Asked Answered
A

8

44

I have a date in format "4/5/2011" (month/day/year) in a xlsx file in one of the cells. Im trying to parse the file and load those data in some classes.

So far the part where I parse the cell looks like this:

string cellValue = cell.InnerText;
if (cell.DataType != null)
{
    switch (cell.DataType.Value)
    {
        case CellValues.SharedString:
            // get string from shared string table
            cellValue = this.GetStringFromSharedStringTable(int.Parse(cellValue));
            break;
    }
}

I hoped that date would be a cell.DataType. The truth is when parsing the cell with the date "4/5/2011", the value of cell.DataType is null and the value of the cell is "40638" and it is not an index to the shared string table. (I have tried that before and it ended up with an exception.)

Agincourt answered 1/11, 2012 at 12:13 Comment(0)
J
49

Open XML stores dates as the number of days from 1 Jan 1900. Well, skipping the incorrect 29 Feb 1900 as a valid day. You should be able to find out algorithms to help you calculate the correct value. I believe some developers use DateTime.FromOADate() as a helper.

Also, the Cell class has the DataType property as Number by default. So if it's null, it's a number, which includes dates in our case.

You only go to the shared strings table when the date stored is before the epoch (1 Jan 1900 in this case). And then in that case, the CellValue of the Cell class holds the index to the shared string table.

Jaddo answered 1/11, 2012 at 13:23 Comment(4)
@Santhos, do you have to do any adjustment besides FromOADate? @VincentTan mentions it only as a "helper method".Leilani
I have one problem: how to distinguish between inline numbers and numbers which supposed to be dates? In both cases for me the cell.DataType is null, and I have a string representation of a number in the Text.Leilani
I now have the experience with both EPPlus library and the Open SDK and for all future viewer I would like to state that using EPPlus is a very good choice (better than using Open SDK) unless you want to do something that is not supported by EPPlusAgincourt
To distinguish between numbers and dates, see comment #13177332Grievous
O
18

It appears that the cell.DataType is not set for dates.

The way to do it is to see if the cell has a StyleIndex, which is an index into an array of cell formats in the document.

You then use the cellFormat.NumberFormatId to see if this is a date data type.

Here is some code:

    public class ExcelCellWithType
    {
        public string Value { get; set; }
        public UInt32Value ExcelCellFormat { get; set; }
        public bool IsDateTimeType { get; set; }
    }  

    public class ExcelDocumentData
    {
        public ExcelXmlStatus Status { get; set; }
        public IList<Sheet> Sheets { get; set; }
        public IList<ExcelSheetData> SheetData { get; set; }

        public ExcelDocumentData()
        {
            Status = new ExcelXmlStatus();
            Sheets = new List<Sheet>();
            SheetData = new List<ExcelSheetData>();
        }
    } 

    ...

    public ExcelDocumentData ReadSpreadSheetDocument(SpreadsheetDocument mySpreadsheet, ExcelDocumentData data)
    {
        var workbookPart = mySpreadsheet.WorkbookPart;

        data.Sheets = workbookPart.Workbook.Descendants<Sheet>().ToList();

        foreach (var sheet in data.Sheets)
        {
            var sheetData = new ExcelSheetData { SheetName = sheet.Name };
            var workSheet = ((WorksheetPart)workbookPart.GetPartById(sheet.Id)).Worksheet;

            sheetData.ColumnConfigurations = workSheet.Descendants<Columns>().FirstOrDefault();
            var rows = workSheet.Elements<SheetData>().First().Elements<Row>().ToList();
            if (rows.Count > 1)
            {
                foreach (var row in rows)
                {
                    var dataRow = new List<ExcelCellWithType>();

                    var cellEnumerator = GetExcelCellEnumerator(row);
                    while (cellEnumerator.MoveNext())
                    {
                        var cell = cellEnumerator.Current;
                        var cellWithType = ReadExcelCell(cell, workbookPart);
                        dataRow.Add(cellWithType);
                    }

                    sheetData.DataRows.Add(dataRow);
                }
            }
            data.SheetData.Add(sheetData);
        }

        return data;
    }

    ...

    private ExcelCellWithType ReadExcelCell(Cell cell, WorkbookPart workbookPart)
    {
        var cellValue = cell.CellValue;
        var text = (cellValue == null) ? cell.InnerText : cellValue.Text;
        if (cell.DataType?.Value == CellValues.SharedString)
        {
            text = workbookPart.SharedStringTablePart.SharedStringTable
                .Elements<SharedStringItem>().ElementAt(
                    Convert.ToInt32(cell.CellValue.Text)).InnerText;
        }

        var cellText = (text ?? string.Empty).Trim();

        var cellWithType = new ExcelCellWithType();

        if (cell.StyleIndex != null)
        {
            var cellFormat = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats.ChildElements[
                int.Parse(cell.StyleIndex.InnerText)] as CellFormat;

            if (cellFormat != null)
            {
                cellWithType.ExcelCellFormat = cellFormat.NumberFormatId;

                var dateFormat = GetDateTimeFormat(cellFormat.NumberFormatId);
                if (!string.IsNullOrEmpty(dateFormat))
                {
                    cellWithType.IsDateTimeType = true;

                    if (!string.IsNullOrEmpty(cellText))
                    {
                       if (double.TryParse(cellText, out var cellDouble))
                        {
                            var theDate = DateTime.FromOADate(cellDouble);
                            cellText = theDate.ToString(dateFormat);
                        }
                    }
                }
            }
        }

        cellWithType.Value = cellText;

        return cellWithType;
    }

    //// https://msdn.microsoft.com/en-GB/library/documentformat.openxml.spreadsheet.numberingformat(v=office.14).aspx
    private readonly Dictionary<uint, string> DateFormatDictionary = new Dictionary<uint, string>()
    {
        [14] = "dd/MM/yyyy",
        [15] = "d-MMM-yy",
        [16] = "d-MMM",
        [17] = "MMM-yy",
        [18] = "h:mm AM/PM",
        [19] = "h:mm:ss AM/PM",
        [20] = "h:mm",
        [21] = "h:mm:ss",
        [22] = "M/d/yy h:mm",
        [30] = "M/d/yy",
        [34] = "yyyy-MM-dd",
        [45] = "mm:ss",
        [46] = "[h]:mm:ss",
        [47] = "mmss.0",
        [51] = "MM-dd",
        [52] = "yyyy-MM-dd",
        [53] = "yyyy-MM-dd",
        [55] = "yyyy-MM-dd",
        [56] = "yyyy-MM-dd",
        [58] = "MM-dd",
        [165] = "M/d/yy",
        [166] = "dd MMMM yyyy",
        [167] = "dd/MM/yyyy",
        [168] = "dd/MM/yy",
        [169] = "d.M.yy",
        [170] = "yyyy-MM-dd",
        [171] = "dd MMMM yyyy",
        [172] = "d MMMM yyyy",
        [173] = "M/d",
        [174] = "M/d/yy",
        [175] = "MM/dd/yy",
        [176] = "d-MMM",
        [177] = "d-MMM-yy",
        [178] = "dd-MMM-yy",
        [179] = "MMM-yy",
        [180] = "MMMM-yy",
        [181] = "MMMM d, yyyy",
        [182] = "M/d/yy hh:mm t",
        [183] = "M/d/y HH:mm",
        [184] = "MMM",
        [185] = "MMM-dd",
        [186] = "M/d/yyyy",
        [187] = "d-MMM-yyyy"
    };

    private string GetDateTimeFormat(UInt32Value numberFormatId)
    {
        return DateFormatDictionary.ContainsKey(numberFormatId) ? DateFormatDictionary[numberFormatId] : string.Empty;
    }
Ontology answered 29/3, 2019 at 21:33 Comment(2)
Thinking about it, it is probably sufficient just to identify the date format numbers for your culture (the above example has US and UK date numbers that I could find by example from Excel 365). It might be possible to get the formats from the document which I guess is not this question.Ontology
IMO this answer is more correct because if you are reading a random XLSX file, you have to go through this process to even figure out if the cell contains a date, not just a number.Ethicize
T
14

you can use DateTime.FromOADate(41690)

Telepathist answered 1/3, 2014 at 1:54 Comment(0)
D
4

Adding my 2 pence worth. I am processing a template, so I know a given cell is meant to be a DateTime. So I end up in this method with a string parameter excelDateTime containing the cell value, which typically will be a OADate number like "42540.041666666664".

public static bool TryParseExcelDateTime(string excelDateTimeAsString, out DateTime dateTime)
{
    double oaDateAsDouble;
    if (!double.TryParse(excelDateTimeAsString, out oaDateAsDouble)) //this line is Culture dependent!
        return false;
    //[...]
    dateTime = DateTime.FromOADate(oaDateAsDouble);

My problem is that the end user is in Germany, and because this is a website, we've set the Thread.CurrentThread.CurrentCulture and Thread.CurrentThread.CurrentUICulture to "DE-de". And when you call double.TryParse, it uses the culture to parse the number. So this line: double.TryParse("42540.041666666664", out oaDate) does indeed work, but it returns 42540041666666664 as in Germany the dot is a group separator. DateTime.FromOADate then fails because the number is out of range (minOaDate = -657435.0, maxOaDate = +2958465.99999999).

This make me think that:

  1. regardless of the locale on a user's machine, the OpenXML document contains numbers formatted in a default locale (US? invariant? in any case, with the dot as a decimal separator). I've searched, but not found the spec for this.
  2. when doing double.TryParse on a potential OADate string, we should do it with double.TryParse(excelDateTimeAsString, NumberStyles.Any, CultureInfo.InvariantCulture, out oaDateAsDouble)). I'm using CultureInfo.InvariantCulture, but it should be whatever point 1 is, which I don't know for sure.
Doom answered 23/11, 2016 at 14:31 Comment(1)
I had the exact same problem. I solved it with this: Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;Marlenemarler
I
3

We need to adopt different strategies to parse different types of columns using OpenXML.

To parse string & boolean values - we can use DataType property of cell, like below -

        switch (cell.DataType.Value)
        {
            case CellValues.SharedString:
                // Fetch value from SharedStrings array
                break;
            case CellValues.Boolean:
                text = cell.InnerText;
                switch (text)
                {
                    case "0": text = "false"; break;
                    default: text = "true"; break;
                }
                break;
        }

To parse date/time/datetime values (having either any built-in or any custom format applied) - DataType property is returned as null so this can be put like below -

    if (cell.DataType == null)
        DateTime.FromOADate(double.Parse(cell.InnerText))

The above value returned will be in default format based on the locale settings on your machine. However, if you need to get the value in actual format as present in your excel & you are not sure of the format then you can access StyleIndex property associated with such cells.

This StyleIndex property will give you an index of the style applied on the cell, which can be found in styles.xml file (below tag) -

    <cellXfs count="3">
        <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/>          
        <xf numFmtId="168" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
        <xf numFmtId="169" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
    </cellXfs>

In the above case, the StyleIndex value can be either 0, 1 or 2 - as there are 3 styles applied. Styles with numFmtId in (0, 163) corresponds to built-in formats provided by Excel & numFmtId >= 164 corresponds to custom formats.

From the StyleIndex value obtained above, you will get the numFmtId - which is mapped to a particular <numFmt> tag present under <numFmts> section (in styles.xml file) to get the actual date format applied on the cell.

    <numFmts count="2">
       <numFmt numFmtId="168" formatCode="[$£-809]#,##0.00"/>
       <numFmt numFmtId="169" formatCode="dd\-mmm\-yyyy\ hh:mm:ss"/>
    </numFmts>

The date format applied on the cell can be fetched using OpenXML API as well -

      CellFormat cellFmt = cellFormats.ChildElements[int.Parse(cell.StyleIndex.InnerText)] as CellFormat;
      string format = numberingFormats.Elements<NumberingFormat>()
                .Where(i => i.NumberFormatId.Value == cellFmt .NumberFormatId.Value)
                .First().FormatCode;
Intemperate answered 13/1, 2022 at 6:46 Comment(0)
S
2

I had same issue - switched to EPPlus http://epplus.codeplex.com/

Note that it has LGPL license. So if you need your code base to be safe from GPL issue, simply use the library as is and your original code base license is safe.

Subnormal answered 1/11, 2012 at 12:18 Comment(7)
I would rather stick to open xml sdk, but if I dont find a solution I will try this one. ThanksAgincourt
you have a solution in openxml sdk but it is way too convoluted to get it going in few statements. Here's a pointer - The styles documentpart tells you the numberformat. iterate within them, choose the matching elementSubnormal
I dont understand. I would expect that number format is something like "number: #.###" or "date: m/d/y". That number I got from the cell must mean something. Do you know what it means?Agincourt
No - You actually have to get the stylesdocument part. Then locate the cell's formatting entry. It will have numberformat="1" or such numbers. 14 to 18 refer to date format. You'll then have to lookup the date format and then load the data #8930299Subnormal
I have recently tried this library and it is really very easy to use and for some solution probably far better than the original open xml sdkAgincourt
This does not answer the question.Dehnel
I'm actually here learning more about using Open XML SDK because I'm switching away from EPPlus because they quit fixing bugs in the open source version and are in the process of switching away from being open source.Ethicize
U
0

I do this after I retrieve any inline string:

    private static object Convert(this DocumentFormat.OpenXml.Spreadsheet.CellValues value, string content)
    {
        switch (value)
        {
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.Boolean:
                if (content.Length < 2)
                {
                    return content?.ToUpperInvariant() == "T" || content == "1";
                }
                return System.Convert.ToBoolean(content);
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.Date:
                if (double.TryParse(content, out double result))
                {
                    return System.DateTime.FromOADate(result);
                }
                return null;
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.Number:
                return System.Convert.ToDecimal(content);
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.Error:
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.String:
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.InlineString:
            case DocumentFormat.OpenXml.Spreadsheet.CellValues.SharedString:
            default:
                return content;
        }
    }
Uxmal answered 23/7, 2020 at 17:38 Comment(0)
B
-1

Each cell has 2 properties r (CellReference) and s(StyleIndex)

StyleIndex for numbers is 2 and for date is 3

Date it is in ODate and you can convert to string format

value = DateTime.FromOADate(double.Parse(value)).ToShortDateString();

Broadway answered 28/12, 2016 at 17:18 Comment(1)
I have a Date field with StyleIndex = 11Eulogistic

© 2022 - 2024 — McMap. All rights reserved.