OpenXML SDK returning a number for CellValue instead of cells text
Asked Answered
F

5

59

I am using the Open XML SDK to open an Excel xlsx file and I try to read the cellvalue on position A1 in each sheet. I use the following code:

using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(openFileDialog1.FileName, false))
{
    var sheets = spreadsheetDocument.WorkbookPart.Workbook.Descendants<Sheet>();

    foreach (Sheet sheet in sheets)
    {
        WorksheetPart worksheetPart = (WorksheetPart)spreadsheetDocument.WorkbookPart.GetPartById(sheet.Id);
        Worksheet worksheet = worksheetPart.Worksheet;

        Cell cell = GetCell(worksheet, "A", 1);

        Console.Writeline(cell.CellValue.Text);
     }
}

private static Cell GetCell(Worksheet worksheet, string columnName, uint rowIndex)
{
     Row row = GetRow(worksheet, rowIndex);

     if (row == null)
         return null;

     return row.Elements<Cell>().Where(c => string.Compare
               (c.CellReference.Value, columnName +
               rowIndex, true) == 0).First();
}

// Given a worksheet and a row index, return the row.
private static Row GetRow(Worksheet worksheet, uint rowIndex)
{
    return worksheet.GetFirstChild<SheetData>().
          Elements<Row>().Where(r => r.RowIndex == rowIndex).First();
} 

The text in the first worksheet on position A1 is simply 'test' however, in my console I see the value '0' as cell.CellValue.Text

Does anyone have an idea to get the correct value of the cell?

Fanjet answered 25/2, 2011 at 8:49 Comment(0)
T
85

All strings in an Excel worksheet are stored in a array like structure called the SharedStringTable. The goal of this table is to centralize all strings in an index based array and then if that string is used multiple times in the document to just reference the index in this array. That being said, the 0 you received when you got the text value of the A1 cell is the index into the SharedStringTable. To get the real value you can use this helper function:

public static SharedStringItem GetSharedStringItemById(WorkbookPart workbookPart, int id)
{
    return workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(id);
}

Then in your code call it like this to get the real value:

Cell cell = GetCell(worksheet, "A", 1);

string cellValue = string.Empty;

if (cell.DataType != null)
{
    if (cell.DataType == CellValues.SharedString)
    {
       int id = -1;

       if (Int32.TryParse(cell.InnerText, out id))
       {
           SharedStringItem item = GetSharedStringItemById(workbookPart, id);

           if (item.Text != null)
           {
               cellValue = item.Text.Text;
           }
           else if (item.InnerText != null)
           {
               cellValue = item.InnerText;
           }
           else if (item.InnerXml != null)
           {
               cellValue = item.InnerXml;
           }
       }
    }
}
Tippet answered 26/2, 2011 at 3:58 Comment(8)
This is correct but does not address all of the issues required. Before looking up a cell value in the SST you need to actually determine if he cell value represents an SST index or is actually a value.Incertitude
@Samuel Neff - By default Excel will put all basic strings into the SST and in this question he is only concerned with getting this basic string value. No need to over complicate a basic scenario. If he is dealing with formulas or other pieces of data then obviously the code above will need to change to incorporate your comment.Tippet
tend to agree with amurra on this one - the OP is just asking for the basic value. the fact through these comments that he now knows he may need to account for other stuff makes the answer sufficient for the question asked. other things like formulas could be asked in a different question.Systematics
I'm adding this comment because the actual solution for determining whether a cell value represents an SST index was never posted for some reason (extremely annoying): if (cell.DataType != null && cell.DataType == CellValues.SharedString)Vauntcourier
Samuel and genki are correct. Should there be a date or other value that is not shared string, this code will choke trying to get SI from id where id is already a value, not actual si identificationPalladous
The code was updated to check for a shared string data type before accessing the SharedStringTable so it shouldn't choke anymore.Tippet
Excellent example/solution @amurra, but the whole Open XML SDK kind of makes you not want to bother. I guess the 'not SharedString' part of it is to just return the Text or InnerText? I don't get the item.Text.Text bit.Era
I agree with you that there is a lot that you have to do in order to work with the Open XML SDK. Without the SDK you would have to manipulate XML directly, so at least you can work with strongly typed objects. I think the double Text.Text is due to there being a Text element that then has a Text property so you need to use Text.Text in order to access it. It's been awhile since I've used this code so that might be off a little.Tippet
L
16

Amurra's answer seems to go ninety percent of the way, but it may need some nuance.

1) The function "GetSharedStringItemById" returns a SharedStringItem, not a string, such that the calling code example will not work. To get the actual value as a string, I believe you need to ask for the SharedStringItem's InnerText property, as follows:

public static string GetSharedStringItemById(WorkbookPart workbookPart, int id)
{
    return workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(id).InnerText;
}

2) The function also (correctly) asks for an int as part of its signature, but the example code call supplies a string, cell.CellValue.Text. It's trivial to convert the string to an int, but needs to be done, as the code as written might to be confusing.

Lucretialucretius answered 17/11, 2011 at 18:44 Comment(0)
C
12

Found this very usefull snippet quite a while ago, so can not meantion the author.

private static string GetCellValue(string fileName, string sheetName, string addressName)
    {
        string value = null;

        using(SpreadsheetDocument document =  SpreadsheetDocument.Open(fileName, false))
        {
            WorkbookPart wbPart = document.WorkbookPart;

            // Find the sheet with the supplied name, and then use that Sheet
            // object to retrieve a reference to the appropriate worksheet.
            Sheet theSheet = wbPart.Workbook.Descendants<Sheet>().
              Where(s => s.Name == sheetName).FirstOrDefault();

            if(theSheet == null)
            {
                throw new ArgumentException("sheetName");
            }

            // Retrieve a reference to the worksheet part, and then use its 
            // Worksheet property to get a reference to the cell whose 
            // address matches the address you supplied:
            WorksheetPart wsPart = (WorksheetPart)(wbPart.GetPartById(theSheet.Id));
            Cell theCell = wsPart.Worksheet.Descendants<Cell>().
              Where(c => c.CellReference == addressName).FirstOrDefault();

            // If the cell does not exist, return an empty string:
            if(theCell != null)
            {
                value = theCell.InnerText;

                // If the cell represents a numeric value, you are done. 
                // For dates, this code returns the serialized value that 
                // represents the date. The code handles strings and Booleans
                // individually. For shared strings, the code looks up the 
                // corresponding value in the shared string table. For Booleans, 
                // the code converts the value into the words TRUE or FALSE.
                if(theCell.DataType != null)
                {
                    switch(theCell.DataType.Value)
                    {
                        case CellValues.SharedString:
                            // For shared strings, look up the value in the shared 
                            // strings table.
                            var stringTable = wbPart.
                              GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
                            // If the shared string table is missing, something is 
                            // wrong. Return the index that you found in the cell.
                            // Otherwise, look up the correct text in the table.
                            if(stringTable != null)
                            {
                                value = stringTable.SharedStringTable.
                                  ElementAt(int.Parse(value)).InnerText;
                            }
                            break;

                        case CellValues.Boolean:
                            switch(value)
                            {
                                case "0":
                                    value = "FALSE";
                                    break;
                                default:
                                    value = "TRUE";
                                    break;
                            }
                            break;
                    }
                }
            }
        }
        return value;
    }
Cretic answered 24/8, 2012 at 11:29 Comment(3)
It's from msdn: msdn.microsoft.com/en-us/library/office/…Educatee
This code is so slow it's impractical to load anything more than 5x5 table. It takes like ~200ms to add one row!Palladous
This might well be because it's opening the file every time. :)Whirlabout
C
4

I found this post of reading entire excel data as a data table very helpful. It also uses open-xml sdk.

using System;
using System.Data;
using System.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

public static DataTable ReadAsDataTable(string fileName)
{
    DataTable dataTable = new DataTable();
    using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(fileName, false))
    {
        WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
        IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
        string relationshipId = sheets.First().Id.Value;
        WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
        Worksheet workSheet = worksheetPart.Worksheet;
        SheetData sheetData = workSheet.GetFirstChild<SheetData>();
        IEnumerable<Row> rows = sheetData.Descendants<Row>();

        foreach (Cell cell in rows.ElementAt(0))
        {
            dataTable.Columns.Add(GetCellValue(spreadSheetDocument, cell));
        }

        foreach (Row row in rows)
        {
            DataRow dataRow = dataTable.NewRow();
            for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
            {
                dataRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
            }

            dataTable.Rows.Add(dataRow);
        }

    }
    dataTable.Rows.RemoveAt(0);

    return dataTable;
}

private static string GetCellValue(SpreadsheetDocument document, Cell cell)
{
    SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
    string value = cell.CellValue.InnerXml;

    if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
    {
        return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
    }
    else
    {
        return value;
    }
}

Note: There is an issue that while reading the excel it ignores the empty cells in each row. So this code is best fit when you are sure of that every cell in each row will have some data. If you want an appropriate handling for the same then you can do the following:

Change the code of for loop :

dataRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));

to

Cell cell = row.Descendants<Cell>().ElementAt(i);
int actualCellIndex = CellReferenceToIndex(cell);
dataRow[actualCellIndex] = GetCellValue(spreadSheetDocument, cell);

and add below method which is used in the above modified code snippet:

private static int CellReferenceToIndex(Cell cell)
{
    int index = 0;
    string reference = cell.CellReference.ToString().ToUpper();
    foreach (char ch in reference)
    {
        if (Char.IsLetter(ch))
        {
            int value = (int)ch - (int)'A';
            index = (index == 0) ? value : ((index + 1) * 26) + value;
        }
        else
            return index;
    }
    return index;
}

I got this fix from this answer.

Candancecandela answered 17/1, 2018 at 7:41 Comment(0)
W
2

Another option: Export your data to an html table and utilize stylesheets to specify the read only cells. See this page for more: http://www.c-sharpcorner.com/UploadFile/kaushikborah28/79Nick08302007171404PM/79Nick.aspx

Wheelhouse answered 12/5, 2011 at 15:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.