How do I iterate through rows in an excel table using epplus?
Asked Answered
D

6

66

I am new to epplus, and i'm trying to read some values from an excel table.

This is what I have so far:

var fileInfo = new FileInfo(filename);
using(var excelPackage = new OfficeOpenXml.ExcelPackage(fileInfo))
{
    foreach (var sheet in excelPackage.Workbook.Worksheets)
    {
        foreach (ExcelTable table in sheet.Tables)
        {
             foreach(var row in table.Rows)  // <-- !!
             { ... }
        }
    }
}

However, now I am stumped, as the ExcelTable only has a Columns property, but not a Rows property as I had expected. I cannot find a Rows property on any object in the library.

How do I iterate through a table, reading Row for Row?

Deadandalive answered 12/2, 2014 at 23:18 Comment(2)
I ended up using the Microsoft.Office.Interop.Excel classes. They support reading the Table objects. You just need to know that tables in excel sheets are called ListObjects in the API.Taejon
I'm moving away from Interop. Had several issues where it couldn't handle files if they were corrupted. I've put up an answer for this with a link that I found.Zealot
Z
126

While searching for help on the same problem, I stumbled across this link. It certainly worked for me! Definitely better than using Interop objects. :)

I adapted it slightly though:

var package = new ExcelPackage(new FileInfo("sample.xlsx"));

ExcelWorksheet workSheet = package.Workbook.Worksheets[0];
var start = workSheet.Dimension.Start;
var end = workSheet.Dimension.End;
for (int row = start.Row; row <= end.Row; row++)
{ // Row by row...
    for (int col = start.Column; col <= end.Column; col++)
    { // ... Cell by cell...
        object cellValue = workSheet.Cells[row, col].Text; // This got me the actual value I needed.
    }
}
Zealot answered 10/3, 2014 at 21:9 Comment(5)
You have put the counters backwards, you should do like: workSheet.Cells[j, i] since the syntax is [row, column] but in your case you did [column, row].Hygienic
I found that the Worksheets are not zero-based. I had to use package.Workbook.Worksheets[1] to get the first worksheet.Gory
@Gory Yeah, worksheets, cells, everything is indexed from 0. For example I want to get my second column, I have to use col == 2.Dolli
Worksheets are 1 based for .NET 4 and up, and 0 based for .Net Core 2.0 (netstandard2.0) github.com/JanKallman/EPPlus/blob/master/EPPlus/…Superimposed
This describes how to read each cell in a worksheet, but doesn't answer the question of how to loop through each row of a Table.Janessa
L
20

Here's a way to get the complete row as ExcelRange which then can be iterated or used for LINQ:

for (var rowNum = 1; rowNum <= sheet.Dimension.End.Row; rowNum++)
{
    var row = sheet.Cells[string.Format("{0}:{0}", rowNum)];
    // just an example, you want to know if all cells of this row are empty
    bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
    if (allEmpty) continue; // skip this row
    // ...
}
Langsyne answered 19/12, 2014 at 11:45 Comment(3)
Would you mind to add code how to obtain the text of the cell please.Caporal
Here is how to get the text of the cell on row 5 cell 1 var text = worksheet.Cells[5, 1].Value?.ToString() ?? "";Earwitness
Did that and end up with range of 16384 columns even though I had 5 of them in my workbook.Egret
O
12

You can access the .Worksheet property of a table and index its cells. I wrote an extension method for this purpose, which generates a series of dictionaries mapping column name to cell value:

public static IEnumerable<IDictionary<string, object>> GetRows(this ExcelTable table)
{
    var addr = table.Address;
    var cells = table.WorkSheet.Cells;

    var firstCol = addr.Start.Column;

    var firstRow = addr.Start.Row;
    if (table.ShowHeader)
        firstRow++;
    var lastRow = addr.End.Row;

    for (int r = firstRow; r <= lastRow; r++)
    {
        yield return Enumerable.Range(0, table.Columns.Count)
            .ToDictionary(x => table.Columns[x].Name, x => cells[r, firstCol + x].Value);
    }
}
Obeded answered 28/10, 2014 at 15:42 Comment(2)
How to you get the ExcelTable object? I can't see it being used in the sample project from EPPlushPrivilege
@BjarkiHeiðar see the question; excelPackage.Workbook.Worksheets.SelectMany(s => s.Tables)Obeded
N
6

I had the same issue and I solved it using the ExcelTable to get the table boundary and the ExcelWorksheet to retrieve the data. So your code will look something like this:

var fileInfo = new FileInfo(filename);
using(var excelPackage = new OfficeOpenXml.ExcelPackage(fileInfo))
{
    foreach (var sheet in excelPackage.Workbook.Worksheets)
    {
        foreach (ExcelTable table in sheet.Tables)
        {
            ExcelCellAddress start = table.Address.Start;
            ExcelCellAddress end = table.Address.End;

            for (int row = start.Row; row <= end.Row; ++row)
            {
                ExcelRange range = sheet.Cells[row, start.Column, row, end.Column];
                ...
            }
        }
    }
}

You need to check for table header or other things, but that did the trick for me.

Nab answered 18/5, 2017 at 16:3 Comment(0)
T
3

I also was trying to figure out how to properly iterate through the objects and get at the data that i need with this API.

I collected info from various posts and the getting started page from the author and put it all together to assist myself and others.

The main issue is your entry point for iteration. Most solutions I've seen go after the Worksheet, whereas this question is specific on the Table, i was curious about both so i'm presenting my findings on both.

Worksheet Example:

using (var package = new ExcelPackage(new FileInfo(file)))
{
    //what i've seen used the most, entry point is the worksheet not the table w/i the worksheet(s)
    using (var worksheet = package.Workbook.Worksheets.FirstOrDefault())
    {
        if (worksheet != null)
        {
            for (int rowIndex = worksheet.Dimension.Start.Row; rowIndex <= worksheet.Dimension.End.Row; rowIndex++)
            {
                var row = worksheet.Row(rowIndex);
                //from comments here... https://github.com/JanKallman/EPPlus/wiki/Addressing-a-worksheet
                //#:# gets entire row, A:A gets entire column
                var rowCells = worksheet.Cells[$"{rowIndex}:{rowIndex}"];
                //returns System.Object[,]
                //type is string so it likely detects many cells and doesn't know how you want the many formatted together...
                var rowCellsText = rowCells.Text;
                var rowCellsTextMany = string.Join(", ", rowCells.Select(x => x.Text));
                var allEmptyColumnsInRow = rowCells.All(x => string.IsNullOrWhiteSpace(x.Text));
                var firstCellInRowWithText = rowCells.Where(x => !string.IsNullOrWhiteSpace(x.Text)).FirstOrDefault();
                var firstCellInRowWithTextText = firstCellInRowWithText?.Text;
                var firstCellFromRow = rowCells[rowIndex, worksheet.Dimension.Start.Column];
                var firstCellFromRowText = firstCellFromRow.Text;
                //throws exception...
                //var badRow = rowCells[worksheet.Dimension.Start.Row - 1, worksheet.Dimension.Start.Column - 1];

                //for me this happened on row1 + row2 beign merged together for the column headers
                //not sure why the row.merged property is false for both rows though
                if (allEmptyColumnsInRow)
                    continue;

                for (int columnIndex = worksheet.Dimension.Start.Column; columnIndex <= worksheet.Dimension.End.Column; columnIndex++)
                {
                    var column = worksheet.Column(columnIndex);
                    var currentRowColumn = worksheet.Cells[rowIndex, columnIndex];
                    var currentRowColumnText = currentRowColumn.Text;
                    var currentRowColumnAddress = currentRowColumn.Address;
                    //likely won't need to do this, but i wanted to show you can tangent off at any level w/ that info via another call
                    //similar to row, doing A:A or B:B here, address is A# so just get first char from address
                    var columnCells = worksheet.Cells[$"{currentRowColumnAddress[0]}:{currentRowColumnAddress[0]}"];
                    var columnCellsTextMany = string.Join(", ", columnCells.Select(x => x.Text));
                    var allEmptyRowsInColumn = columnCells.All(x => string.IsNullOrWhiteSpace(x.Text));
                    var firstCellInColumnWithText = columnCells.Where(x => !string.IsNullOrWhiteSpace(x.Text)).FirstOrDefault();
                    var firstCellInColumnWithTextText = firstCellInColumnWithText?.Text;
                }
            }
        }
    }
}

Now things can get a bit messed up here, for me at least i had no tables to start with. Under the same package using statement, if i were to first iterate over the worksheet cells and then touch anything with the Tables property it threw an exception. If i re-instantiate a package and use the same/similar code it doesn't blow up when seeing if we have any Tables or not.

Table Example:

//for some reason, if i don't instantiating another package and i work with the 'Tables' property in any way, the API throws a...
//Object reference not set to an instance of an object.
//at OfficeOpenXml.ExcelWorksheet.get_Tables()
//excetion... this is because i have data in my worksheet but not an actual 'table' (Excel => Insert => Table)
//a parital load of worksheet cell data + invoke to get non-existing tables must have a bug as below code does not
//throw an exception and detects null gracefully on firstordefault
using (var package = new ExcelPackage(new FileInfo(file)))
{
    //however, question was about a table, so lets also look at that... should be the same?
    //no IDisposable? :(
    //adding a table manually to my worksheet allows the 'same-ish' (child.Parent, aka table.WorkSheet) code to iterate
    var table = package.Workbook.Worksheets.SelectMany(x => x.Tables).FirstOrDefault();

    if (table != null)
    {
        for (int rowIndex = table.Address.Start.Row; rowIndex <= table.Address.End.Row; rowIndex++)
        {
            var row = table.WorkSheet.Row(rowIndex);

            var rowCells = table.WorkSheet.Cells[$"{rowIndex}:{rowIndex}"];
            var rowCellsManyText = string.Join(", ", rowCells.Select(x => x.Text));

            for (int columnIndex = table.Address.Start.Column; columnIndex <= table.Address.End.Column; columnIndex++)
            {
                var currentRowColumn = table.WorkSheet.Cells[rowIndex, columnIndex];
                var currentRowColumnText = currentRowColumn.Text;
            }
        }
    }
}

Essentially everything works and operates the same way, you just have to go after child.Parent, AKA table.WorkSheet to get at the same stuff. As others have mentioned, extension methods and possibly even wrapper class(es) could get you more granularity based on the specifics of your business needs but that was not the purpose of this question.

In regards to the indexing comments and responses, I'd advise sticking with the 'Row' and 'Column' properties, first, last, for, foreach etc. instead of hard-coding index vs non-indexed base attributes, i had no issue here at least w/ the new version.

Trudeau answered 10/9, 2018 at 21:47 Comment(1)
You are an angel! Your worksheet-method was exactly what I neededPhysicalism
J
2

Im not sure of epplus, but I thought I would make a quick suggestion of using LinqToExcel

var excel = new ExcelQueryFactory(excel);

var info = excel.Worksheet("Sheet1")
                .Select(z=> new
                     {
                      Name = row["Name"].Cast<string>(),
                      Age = row["Age"].Cast<int>(),
                     }).ToList();

you can get it from NuGet

Install-Package LinqToExcel
Janitajanith answered 12/2, 2014 at 23:54 Comment(3)
Does LinqToExcel support writing to excel as well?Taejon
Sorry, unfortunately it does not.Janitajanith
Due to your Answer, I moved from OpenXml to LinqToExcel. ThanksOutfit

© 2022 - 2024 — McMap. All rights reserved.