From Excel to DataTable in C# with Open XML
Asked Answered
E

9

48

I'm using Visual Studio 2008 and I need create a DataTable from a Excel Sheet using the Open XML SDK 2.0. I need to create it with the DataTable columns with the first row of the sheet and complete it with the rest of values.

Does anyone have a example code or a link that can help me to do this?

Expertize answered 23/7, 2010 at 18:9 Comment(3)
Any particular reason you need to use OpenXML SDK 2.0?Forgetful
DOK: The server that runs the program can not install MS Office (security policy).Expertize
The old ways I had been trying also wouldn't work with Office 365. It would work on my computer with an older version of Office though. This way will work with 365.Pastille
C
75

I think this should do what you're asking. The other function is there just to deal with if you have shared strings, which I assume you do in your column headers. Not sure this is perfect, but I hope it helps.

static void Main(string[] args)
{
    DataTable dt = new DataTable();

    using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(@"..\..\example.xlsx", false))
    {

        WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
        IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
        string relationshipId = sheets.First().Id.Value;
        WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
        Worksheet workSheet = worksheetPart.Worksheet;
        SheetData sheetData = workSheet.GetFirstChild<SheetData>();
        IEnumerable<Row> rows = sheetData.Descendants<Row>();

        foreach (Cell cell in rows.ElementAt(0))
        {
            dt.Columns.Add(GetCellValue(spreadSheetDocument, cell));
        }

        foreach (Row row in rows) //this will also include your header row...
        {
            DataRow tempRow = dt.NewRow();

            for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
            {
                tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i-1));
            }

            dt.Rows.Add(tempRow);
        }

    }
    dt.Rows.RemoveAt(0); //...so i'm taking it out here.

}


public static string GetCellValue(SpreadsheetDocument document, Cell cell)
{
    SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
    string value = cell.CellValue.InnerXml;

    if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
    {
        return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
    }
    else
    {
        return value;
    }
}
Cytaster answered 28/7, 2010 at 19:17 Comment(6)
Why you use rows.ElementAt(0)? i get only 10 columns, but at my excel file it 30 at least!Mallette
Source is from Read Excel as DataTable using OpenXML and C#Lumberyard
It seems this skips the empty cells, so you may end up having A1, D1 and G1 next to each other in tempRow. Also in some cases CellValue is null and I get an exception in GetCellValue's second line.Ferne
@RahulNikate, your link takes you to this very page!Ferne
The following will line will throw an exception if the cell value is null. string value = cell.CellValue.InnerXml; Add a line before it that checks for null: Checklace if (cell.CellValue == null) { return ""; }Naresh
Hi @Cytaster you this line will throw error tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i-1)); When i=0, it make it -1Pianette
B
19

Hi The above code is working fine except one change

replace the below line of code

tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i-1));

with

tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));

If you use (i-1) it will throw an exception:

specified argument was out of the range of valid values. parameter name index.
Berserk answered 1/6, 2015 at 10:5 Comment(0)
S
6

This solution works for spreadsheets without empty cells.

To handle empty cells, you will need to replace this line:

tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i-1));

with something like this:

Cell cell = row.Descendants<Cell>().ElementAt(i);
int index = CellReferenceToIndex(cell);
tempRow[index] = GetCellValue(spreadSheetDocument, cell);

And add this method:

private static int CellReferenceToIndex(Cell cell)
{
    int index = -1;
    string reference = cell.CellReference.ToString().ToUpper();
    foreach (char ch in reference)
    {
        if (Char.IsLetter(ch))
        {
            int value = (int)ch - (int)'A';
            index = (index + 1) * 26 + value;
        }
        else
            return index;
    }
    return index;
}
Svelte answered 1/12, 2017 at 19:53 Comment(1)
(index == 0) ? value : ((index + 1) * 26) + value => will return incorrect results for multichar indexes starting from A, like AA, AB, AAC etc, as A value would be transformed into 0 instead of 26 multiplierShoop
P
4

This is my complete solution where empty cell is also taken into consideration.

public static class ExcelHelper
        {
            //To get the value of the cell, even it's empty. Unable to use loop by index
            private static string GetCellValue(WorkbookPart wbPart, List<Cell> theCells, string cellColumnReference)
            {
                Cell theCell = null;
                string value = "";
                foreach (Cell cell in theCells)
                {
                    if (cell.CellReference.Value.StartsWith(cellColumnReference))
                    {
                        theCell = cell;
                        break;
                    }
                }
                if (theCell != null)
                {
                    value = theCell.InnerText;
                    // If the cell represents an integer number, you are done. 
                    // For dates, this code returns the serialized value that represents the date. The code handles strings and 
                    // Booleans individually. For shared strings, the code looks up the corresponding value in the shared string table. For Booleans, the code converts the value into the words TRUE or FALSE.
                    if (theCell.DataType != null)
                    {
                        switch (theCell.DataType.Value)
                        {
                            case CellValues.SharedString:
                                // For shared strings, look up the value in the shared strings table.
                                var stringTable = wbPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
                                // If the shared string table is missing, something is wrong. Return the index that is in the cell. Otherwise, look up the correct text in the table.
                                if (stringTable != null)
                                {
                                    value = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
                                }
                                break;
                            case CellValues.Boolean:
                                switch (value)
                                {
                                    case "0":
                                        value = "FALSE";
                                        break;
                                    default:
                                        value = "TRUE";
                                        break;
                                }
                                break;
                        }
                    }
                }
                return value;
            }

            private static string GetCellValue(WorkbookPart wbPart, List<Cell> theCells, int index)
            {
                return GetCellValue(wbPart, theCells, GetExcelColumnName(index));
            }

            private static string GetExcelColumnName(int columnNumber)
            {
                int dividend = columnNumber;
                string columnName = String.Empty;
                int modulo;
                while (dividend > 0)
                {
                    modulo = (dividend - 1) % 26;
                    columnName = Convert.ToChar(65 + modulo).ToString() + columnName;
                    dividend = (int)((dividend - modulo) / 26);
                }
                return columnName;
            }

            //Only xlsx files
            public static DataTable GetDataTableFromExcelFile(string filePath, string sheetName = "")
            {
                DataTable dt = new DataTable();
                try
                {
                    using (SpreadsheetDocument document = SpreadsheetDocument.Open(filePath, false))
                    {
                        WorkbookPart wbPart = document.WorkbookPart;
                        IEnumerable<Sheet> sheets = document.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
                        string sheetId = sheetName != "" ? sheets.Where(q => q.Name == sheetName).First().Id.Value : sheets.First().Id.Value;
                        WorksheetPart wsPart = (WorksheetPart)wbPart.GetPartById(sheetId);
                        SheetData sheetdata = wsPart.Worksheet.Elements<SheetData>().FirstOrDefault();
                        int totalHeaderCount = sheetdata.Descendants<Row>().ElementAt(0).Descendants<Cell>().Count();
                        //Get the header                    
                        for (int i = 1; i <= totalHeaderCount; i++)
                        {
                            dt.Columns.Add(GetCellValue(wbPart, sheetdata.Descendants<Row>().ElementAt(0).Elements<Cell>().ToList(), i));
                        }
                        foreach (Row r in sheetdata.Descendants<Row>())
                        {
                            if (r.RowIndex > 1)
                            {
                                DataRow tempRow = dt.NewRow();

                                //Always get from the header count, because the index of the row changes where empty cell is not counted
                                for (int i = 1; i <= totalHeaderCount; i++)
                                {
                                    tempRow[i - 1] = GetCellValue(wbPart, r.Elements<Cell>().ToList(), i);
                                }
                                dt.Rows.Add(tempRow);
                            }
                        }                    
                    }
                }
                catch (Exception ex)
                {

                }
                return dt;
            }
        }
Pedi answered 9/1, 2019 at 8:45 Comment(1)
This is very similar to the code in this MSDN article which also has a lot of useful tips eg shared strings table etcRotunda
F
4

First Add ExcelUtility.cs to your project :

ExcelUtility.cs

using System.Data;
using System.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;

namespace Core_Excel.Utilities
{
    static class ExcelUtility
    {
        public static DataTable Read(string path)
        {
            var dt = new DataTable();

            using (var ssDoc = SpreadsheetDocument.Open(path, false))
            {
                var sheets = ssDoc.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
                var relationshipId = sheets.First().Id.Value;
                var worksheetPart = (WorksheetPart) ssDoc.WorkbookPart.GetPartById(relationshipId);
                var workSheet = worksheetPart.Worksheet;
                var sheetData = workSheet.GetFirstChild<SheetData>();
                var rows = sheetData.Descendants<Row>().ToList();

                foreach (var row in rows) //this will also include your header row...
                {
                    var tempRow = dt.NewRow();

                    var colCount = row.Descendants<Cell>().Count();
                    foreach (var cell in row.Descendants<Cell>())
                    {
                        var index = GetIndex(cell.CellReference);

                        // Add Columns
                        for (var i = dt.Columns.Count; i <= index; i++)
                            dt.Columns.Add();

                        tempRow[index] = GetCellValue(ssDoc, cell);
                    }

                    dt.Rows.Add(tempRow);
                }
            }

            return dt;
        }

        private static string GetCellValue(SpreadsheetDocument document, Cell cell)
        {
            var stringTablePart = document.WorkbookPart.SharedStringTablePart;
            var value = cell.CellValue.InnerXml;

            if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
                return stringTablePart.SharedStringTable.ChildElements[int.Parse(value)].InnerText;

            return value;
        }

        public static int GetIndex(string name)
        {
            if (string.IsNullOrWhiteSpace(name))
                return -1;

            int index = 0;
            foreach (var ch in name)
            {
                if (char.IsLetter(ch))
                {
                    int value = ch - 'A' + 1;
                    index = value + index * 26;
                }
                else
                    break;
            }

            return index - 1;
        }
    }
}

Usage :

var path = "D:\\Documents\\test.xlsx";
var dt = ExcelUtility.Read(path);

then enjoy it!

Fluviomarine answered 29/2, 2020 at 20:25 Comment(0)
A
2
 Public Shared Function ExcelToDataTable(filename As String) As DataTable
        Try

            Dim dt As New DataTable()

            Using doc As SpreadsheetDocument = SpreadsheetDocument.Open(filename, False)

                Dim workbookPart As WorkbookPart = doc.WorkbookPart
                Dim sheets As IEnumerable(Of Sheet) = doc.WorkbookPart.Workbook.GetFirstChild(Of Sheets)().Elements(Of Sheet)()
                Dim relationshipId As String = sheets.First().Id.Value
                Dim worksheetPart As WorksheetPart = DirectCast(doc.WorkbookPart.GetPartById(relationshipId), WorksheetPart)
                Dim workSheet As Worksheet = worksheetPart.Worksheet
                Dim sheetData As SheetData = workSheet.GetFirstChild(Of SheetData)()
                Dim rows As IEnumerable(Of Row) = sheetData.Descendants(Of Row)()

                For Each cell As Cell In rows.ElementAt(0)
                    dt.Columns.Add(GetCellValue(doc, cell))
                Next

                For Each row As Row In rows
                    'this will also include your header row...
                    Dim tempRow As DataRow = dt.NewRow()

                    For i As Integer = 0 To row.Descendants(Of Cell)().Count() - 1
                        tempRow(i) = GetCellValue(doc, row.Descendants(Of Cell)().ElementAt(i))
                    Next

                    dt.Rows.Add(tempRow)
                Next
            End Using

            dt.Rows.RemoveAt(0)

            Return dt

        Catch ex As Exception
            Throw ex
        End Try
    End Function


    Public Shared Function GetCellValue(document As SpreadsheetDocument, cell As Cell) As String
        Try

            If IsNothing(cell.CellValue) Then
                Return ""
            End If

            Dim value As String = cell.CellValue.InnerXml

            If cell.DataType IsNot Nothing AndAlso cell.DataType.Value = CellValues.SharedString Then
                Dim stringTablePart As SharedStringTablePart = document.WorkbookPart.SharedStringTablePart
                Return stringTablePart.SharedStringTable.ChildElements(Int32.Parse(value)).InnerText
            Else
                Return value
            End If

        Catch ex As Exception
            Return ""
        End Try
    End Function
Aldous answered 23/1, 2013 at 13:14 Comment(0)
F
0

I know it is a long time ago since this thread started. However, none of the solutions above did not really work for me. Empty cells issue and others.

I found a very good solution with 'MIT' license on GitHub: https://github.com/ExcelDataReader/ExcelDataReader This worked for me for both C# and VBnet applications. Sample call from VBNET (the sample code for c# is on GitHub) :

        Using stream As FileStream = New FileStream(DataPath & "\" & fName.Name, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)

            Using reader As IExcelDataReader = ExcelReaderFactory.CreateReader(stream)
                ds = reader.AsDataSet(New ExcelDataSetConfiguration() With {
                    .UseColumnDataType = False,
                    .ConfigureDataTable = Function(tableReader) New ExcelDataTableConfiguration() With {
                        .UseHeaderRow = True
                    }
                })
            End Using

        End Using

The result was a dataset with one table for each sheet in the workbook.

An I really like to compile the dll made in C# by myself rather then using a ready dll. So I can control what I am delivering to customers.

Filamentous answered 24/12, 2020 at 16:22 Comment(0)
B
0

As per my requirements, I have modified few part of code of 'ExcelUtility' Read() from the best answer by D.L.MAN.

Also added saveDataTablesToExcel() and ExportDataSet() method to save multiple DataTables in xlsx file.

Following is the full code of new 'ExcelUtility' class and it's usage.

    using System;
    using System.Collections.Generic;
    using System.Data;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Spreadsheet;

    namespace myNamespace
    {
        static class ExcelUtility
        {
            // SS Note: isHeaderOnTopRow functionality is to set column names as the first row of 'sheet'
            public static DataTable[] Read(string path, bool isHeaderOnTopRow = false)
            {
                try
                {
                    using (var ssDoc = SpreadsheetDocument.Open(path, false))
                    {
                        var sheets = ssDoc.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
                        DataTable[] dtArray = new DataTable[sheets.ToList().Count];
                        int counti = 0;
                        foreach (Sheet sheet in sheets)
                        {
                            var dt = new DataTable();

                            var relationshipId = sheet.Id.Value;
                            var worksheetPart = (WorksheetPart)ssDoc.WorkbookPart.GetPartById(relationshipId);
                            var workSheet = worksheetPart.Worksheet;
                            var sheetData = workSheet.GetFirstChild<SheetData>();
                            var rows = sheetData.Descendants<Row>().ToList();

                            int rowIndex = 0;
                            foreach (var row in rows) //this will also include your header row...
                            {
                                var tempRow = dt.NewRow();

                                var colCount = row.Descendants<Cell>().Count();
                                int colIndex = 0;
                                foreach (var cell in row.Descendants<Cell>())
                                {
                                    var index = GetIndex(cell.CellReference);
                                    // SS Note: ADDED next line as we were getting cell.CellReference (or index) as -1 in our provided xlsx file.
                                    index = (index < 0 ? colIndex++ : index);

                                    // Add Columns
                                    for (var i = dt.Columns.Count; i <= index; i++)
                                        dt.Columns.Add();

                                    if (isHeaderOnTopRow && rowIndex == 0)
                                    {
                                        string heading = GetCellValue(ssDoc, cell);
                                        heading = (heading.Length > 0 ? heading : $"Column{index + 1}");
                                        dt.Columns[index].ColumnName = heading;
                                    }
                                    else
                                    {
                                        tempRow[index] = GetCellValue(ssDoc, cell);
                                    }
                                }
                                if (rowIndex > 0 || isHeaderOnTopRow == false)
                                {
                                    dt.Rows.Add(tempRow);
                                }
                                rowIndex++;
                            }
                            dtArray[counti++] = dt;
                        }
                        return dtArray;
                    }
                }
                catch (Exception e)
                {
                    Console.WriteLine(e);
                }
                return null;
            }

            private static string GetCellValue(SpreadsheetDocument document, Cell cell)
            {
                var stringTablePart = document.WorkbookPart.SharedStringTablePart;
                var value = cell.CellValue.InnerXml;

                if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
                    return stringTablePart.SharedStringTable.ChildElements[int.Parse(value)].InnerText;

                return value;
            }

            public static int GetIndex(string name)
            {
                if (string.IsNullOrWhiteSpace(name))
                    return -1;

                int index = 0;
                foreach (var ch in name)
                {
                    if (char.IsLetter(ch))
                    {
                        int value = ch - 'A' + 1;
                        index = value + index * 26;
                    }
                    else
                        break;
                }

                return index - 1;
            }

            public static void ExportDataSet(DataSet ds, string destination)
            {
                try
                {
                    using (var workbook = SpreadsheetDocument.Create(destination, DocumentFormat.OpenXml.SpreadsheetDocumentType.Workbook))
                    {
                        var workbookPart = workbook.AddWorkbookPart();

                        workbook.WorkbookPart.Workbook = new DocumentFormat.OpenXml.Spreadsheet.Workbook();

                        workbook.WorkbookPart.Workbook.Sheets = new DocumentFormat.OpenXml.Spreadsheet.Sheets();

                        foreach (System.Data.DataTable table in ds.Tables)
                        {

                            var sheetPart = workbook.WorkbookPart.AddNewPart<WorksheetPart>();
                            var sheetData = new DocumentFormat.OpenXml.Spreadsheet.SheetData();
                            sheetPart.Worksheet = new DocumentFormat.OpenXml.Spreadsheet.Worksheet(sheetData);

                            DocumentFormat.OpenXml.Spreadsheet.Sheets sheets = workbook.WorkbookPart.Workbook.GetFirstChild<DocumentFormat.OpenXml.Spreadsheet.Sheets>();
                            string relationshipId = workbook.WorkbookPart.GetIdOfPart(sheetPart);

                            uint sheetId = 1;
                            if (sheets.Elements<DocumentFormat.OpenXml.Spreadsheet.Sheet>().Count() > 0)
                            {
                                sheetId =
                                    sheets.Elements<DocumentFormat.OpenXml.Spreadsheet.Sheet>().Select(s => s.SheetId.Value).Max() + 1;
                            }

                            DocumentFormat.OpenXml.Spreadsheet.Sheet sheet = new DocumentFormat.OpenXml.Spreadsheet.Sheet() { Id = relationshipId, SheetId = sheetId, Name = table.TableName };
                            sheets.Append(sheet);

                            DocumentFormat.OpenXml.Spreadsheet.Row headerRow = new DocumentFormat.OpenXml.Spreadsheet.Row();

                            List<String> columns = new List<string>();
                            foreach (System.Data.DataColumn column in table.Columns)
                            {
                                columns.Add(column.ColumnName);

                                DocumentFormat.OpenXml.Spreadsheet.Cell cell = new DocumentFormat.OpenXml.Spreadsheet.Cell();
                                cell.DataType = DocumentFormat.OpenXml.Spreadsheet.CellValues.String;
                                cell.CellValue = new DocumentFormat.OpenXml.Spreadsheet.CellValue(column.ColumnName);
                                headerRow.AppendChild(cell);
                            }


                            sheetData.AppendChild(headerRow);

                            foreach (System.Data.DataRow dsrow in table.Rows)
                            {
                                DocumentFormat.OpenXml.Spreadsheet.Row newRow = new DocumentFormat.OpenXml.Spreadsheet.Row();
                                foreach (String col in columns)
                                {
                                    DocumentFormat.OpenXml.Spreadsheet.Cell cell = new DocumentFormat.OpenXml.Spreadsheet.Cell();
                                    cell.DataType = DocumentFormat.OpenXml.Spreadsheet.CellValues.String;
                                    cell.CellValue = new DocumentFormat.OpenXml.Spreadsheet.CellValue(dsrow[col].ToString()); //
                                    newRow.AppendChild(cell);
                                }

                                sheetData.AppendChild(newRow);
                            }

                        }
                    }
                } 
                catch (Exception e)
                {
                    Console.WriteLine(e);
                }
            }
            
            public static void saveDataTablesToExcel(DataTable[] dataTables, string saveToFilePath)
            {
                // Create a DataSet
                DataSet dataSet = new DataSet("Tables");
                // We can add multiple DataTable to DataSet
                foreach (DataTable dt in dataTables)
                {
                    dataSet.Tables.Add(dt);
                }

                ExportDataSet(dataSet, saveToFilePath);
            }

        }
    }
    

Usage :

    // save three datatables in xlsx file
    DataTable[] dataTables = new DataTable[3];
    dataTables[0] = firstDataTable;
    dataTables[1] = secondDataTable;
    dataTables[2] = thirdDataTable;
    string fileName = "saved.xlsx";
    saveDataTablesToExcel(dataTables, $"{ExcelFileSaveFolder}{fileName}");

    // retrieve data from first sheet and set it to 'returnTable'
    DataTable returnTable = null;
    var path = $"{ExcelFileSaveFolder}{fileName}";
    DataTable[] getDataTables = ExcelUtility.Read(path, true);
    if (getDataTables != null && getDataTables.Length > 0)
        returnTable = getDataTables[0];
Bhagavadgita answered 27/5, 2023 at 12:55 Comment(0)
C
-1

if rows value is null or empty get values wrong work.

all columns filled with data if it is working true. but maybe all rows not

Cripps answered 1/3, 2017 at 13:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.