Using ExcelDataReader to read Excel data starting from a particular cell
Asked Answered
S

7

48

I am using ExcelDataReader to read data from my Excel workbook in C#.
But structure of my Excel sheet is such that data to be read can start from any particular cell and not necessarily A1.

Can any one Please suggest a way on how this can be achieved using ExcelDataReader?

Symptomatology answered 24/12, 2014 at 9:21 Comment(0)
E
81

If you are using ExcelDataReader 3+ you will find that there isn't any method for AsDataSet() for your reader object, You need to also install another package for ExcelDataReader.DataSet, then you can use the AsDataSet() method.
Also there is not a property for IsFirstRowAsColumnNames instead you need to set it inside of ExcelDataSetConfiguration.

Example:

using (var stream = File.Open(originalFileName, FileMode.Open, FileAccess.Read))
{
    IExcelDataReader reader;

    // Create Reader - old until 3.4+
    ////var file = new FileInfo(originalFileName);
    ////if (file.Extension.Equals(".xls"))
    ////    reader = ExcelDataReader.ExcelReaderFactory.CreateBinaryReader(stream);
    ////else if (file.Extension.Equals(".xlsx"))
    ////    reader = ExcelDataReader.ExcelReaderFactory.CreateOpenXmlReader(stream);
    ////else
    ////    throw new Exception("Invalid FileName");
    // Or in 3.4+ you can only call this:
    reader = ExcelDataReader.ExcelReaderFactory.CreateReader(stream)

    //// reader.IsFirstRowAsColumnNames
    var conf = new ExcelDataSetConfiguration
    {
        ConfigureDataTable = _ => new ExcelDataTableConfiguration
        {
            UseHeaderRow = true 
        }
    };

    var dataSet = reader.AsDataSet(conf);

    // Now you can get data from each sheet by its index or its "name"
    var dataTable = dataSet.Tables[0];

    //...
}

You can find row number and column number of a cell reference like this:

var cellStr = "AB2"; // var cellStr = "A1";
var match = Regex.Match(cellStr, @"(?<col>[A-Z]+)(?<row>\d+)");
var colStr = match.Groups["col"].ToString();
var col = colStr.Select((t, i) => (colStr[i] - 64) * Math.Pow(26, colStr.Length - i - 1)).Sum();
var row = int.Parse(match.Groups["row"].ToString());

Now you can use some loops to read data from that cell like this:

for (var i = row; i < dataTable.Rows.Count; i++)
{
    for (var j = col; j < dataTable.Columns.Count; j++)
    {
        var data = dataTable.Rows[i][j];
    }
}

Update:

You can filter rows and columns of your Excel sheet at read time with this config:

var i = 0;
var conf = new ExcelDataSetConfiguration
{
    UseColumnDataType = true,
    ConfigureDataTable = _ => new ExcelDataTableConfiguration
    {
        FilterRow = rowReader => fromRow <= ++i - 1,
        FilterColumn = (rowReader, colIndex) => fromCol <= colIndex,
        UseHeaderRow = true
    }
};
Edwards answered 3/9, 2017 at 10:42 Comment(10)
Note that in new version (v3.4) is not necessary validate file extension, because of ExcelDataReader validate for us. In case file Extension is invalid, it will generate a exception. ReferenceAalto
@Aalto tnx, I update my answer to show your comment ;).Edwards
thanks so much for this great function. i have a date in XLSX format 10/08/2018 for some reason when i convert this to CSV format, it's inserting a time as well 10/08/2018 12:00:00AM i've attempted attacking this by setting UseColumnDataType = false however, that yielded the same resultAlley
Would love an exampleAlley
@l--''''''---------'''''''''''' I found your question and add an example as an answer, HTH ;).Edwards
In the above code - just in case you were wondering - the Sheets in a Workbook have index starting at zero - so: var dataTable = dataSet.Tables[0] references Sheet 1 in a workbook and var dataTable = dataSet.Tables[1] references Sheet2 in a Workbook etc etc ...Codpiece
@GrantShannon tnx, and I add a comment to code to notify that -HTH ;).Edwards
what is the underscore named? _ =>Theodoratheodore
@Theodoratheodore in C# when we need to pass a variable to an arrow function (lamda) and we know that we will never use it in that function, we usually use underscore as a variable name [An old naming convention] -HTH ;)Edwards
@Edwards Correct. It is variable name which is will be send to arrow function.. Writer does not care what is name of variable. so he put underscore instead of x or anything else name of it. #18301154Theodoratheodore
C
36

To be more clear, I will begin at the beginning.

I will rely on the sample code found in https://github.com/ExcelDataReader/ExcelDataReader, but with some modifications to avoid inconveniences.

The following code detects the file format, either xls or xlsx.

FileStream stream = File.Open(filePath, FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader;

//1. Reading Excel file
if (Path.GetExtension(filePath).ToUpper() == ".XLS")
{
    //1.1 Reading from a binary Excel file ('97-2003 format; *.xls)
    excelReader = ExcelReaderFactory.CreateBinaryReader(stream);
}
else
{
    //1.2 Reading from a OpenXml Excel file (2007 format; *.xlsx)
    excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
}

//2. DataSet - The result of each spreadsheet will be created in the result.Tables
DataSet result = excelReader.AsDataSet();

//3. DataSet - Create column names from first row
excelReader.IsFirstRowAsColumnNames = false;

Now we can access the file contents in a more convenient way. I use DataTable for this. The following is an example to access a specific cell, and print its value in the console:

DataTable dt = result.Tables[0];
Console.WriteLine(dt.Rows[rowPosition][columnPosition]);

If you do not want to do a DataTable, you can do the same as follows:

Console.WriteLine(result.Tables[0].Rows[rowPosition][columnPosition]);

It is important not try to read beyond the limits of the table, for this you can see the number of rows and columns as follows:

Console.WriteLine(result.Tables[0].Rows.Count);
Console.WriteLine(result.Tables[0].Columns.Count);

Finally, when you're done, you should close the reader and free resources:

//5. Free resources (IExcelDataReader is IDisposable)
excelReader.Close();

I hope you find it useful.

(I understand that the question is old, but I make this contribution to enhance the knowledge base, because there is little material about particular implementations of this library).

Cuckoo answered 12/10, 2016 at 15:35 Comment(2)
is it possible to do this on HttpRequest body rather than a filestream? i have a payload that's coming in via the body of a POST requestAlley
@l--''''''---------'''''''''''' It's been a long time since I used the library, but I read the documentation now, and apparently it only supports FileStream. One idea that occurs to me, is to use GetResponseStream() and store your payload in a file, then perform the classic procedure.Cuckoo
G
10

For ExcelDataReader v3.6.0 and above. I struggled a bit to iterate over the Rows. So here's a little more to the above code. Hope it helps for few atleast.

using (var stream = System.IO.File.Open(copyPath, FileMode.Open, FileAccess.Read))
                    {

                        IExcelDataReader excelDataReader = ExcelDataReader.ExcelReaderFactory.CreateReader(stream);

                        var conf = new ExcelDataSetConfiguration()
                        {
                            ConfigureDataTable = a => new ExcelDataTableConfiguration
                            {
                                UseHeaderRow = true
                            }
                        };

                        DataSet dataSet = excelDataReader.AsDataSet(conf);
                        //DataTable dataTable = dataSet.Tables["Sheet1"];
                        DataRowCollection row = dataSet.Tables["Sheet1"].Rows;
                        //DataColumnCollection col = dataSet.Tables["Sheet1"].Columns;

                        List<object> rowDataList = null;
                        List<object> allRowsList = new List<object>();
                        foreach (DataRow item in row)
                        {
                            rowDataList = item.ItemArray.ToList(); //list of each rows
                            allRowsList.Add(rowDataList); //adding the above list of each row to another list
                        }

                    }
Genevagenevan answered 10/4, 2020 at 13:33 Comment(0)
C
6

One way to do it :

FileStream stream = File.Open(@"c:\working\test.xls", FileMode.Open, FileAccess.Read);

IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream);

excelReader.IsFirstRowAsColumnNames = true;

DataSet result = excelReader.AsDataSet();

The result.Tables contains the sheets and the result.tables[0].Rows contains the cell rows.

Cooperman answered 24/12, 2014 at 10:45 Comment(1)
How exactly would you read an excel starting from a specific cell? Could you please explain little more in detail @Sievajet?Salba
H
6

I found this useful to read from a specific column and row:

FileStream stream = File.Open(@"C:\Users\Desktop\ExcelDataReader.xlsx", FileMode.Open, FileAccess.Read);
IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream);
DataSet result = excelReader.AsDataSet();
excelReader.IsFirstRowAsColumnNames = true;         
DataTable dt = result.Tables[0];
string text = dt.Rows[1][0].ToString();
Homy answered 5/7, 2017 at 12:56 Comment(0)
E
3

Very easy with ExcelReaderFactory 3.1 and up:

using (var openFileDialog1 = new OpenFileDialog { Filter = "Excel Workbook|*.xls;*.xlsx;*.xlsm", ValidateNames = true })
{
    if (openFileDialog1.ShowDialog() == DialogResult.OK)
    {
        var fs = File.Open(openFileDialog1.FileName, FileMode.Open, FileAccess.Read);
        var reader = ExcelReaderFactory.CreateBinaryReader(fs);
        var dataSet = reader.AsDataSet(new ExcelDataSetConfiguration
        {
            ConfigureDataTable = _ => new ExcelDataTableConfiguration
            {
                UseHeaderRow = true // Use first row is ColumnName here :D
            }
        });
        if (dataSet.Tables.Count > 0)
        {
            var dtData = dataSet.Tables[0];
            // Do Something
        }
    }
}
Embowed answered 29/6, 2018 at 11:5 Comment(0)
K
3
public static DataTable ConvertExcelToDataTable(string filePath, bool isXlsx = false)
{
    System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
    //open file and returns as Stream
        using (var stream = File.Open(filePath, FileMode.Open, FileAccess.Read))
        {
                using (var reader = ExcelReaderFactory.CreateReader(stream))
                {

                    var conf = new ExcelDataSetConfiguration
                    {
                        ConfigureDataTable = _ => new ExcelDataTableConfiguration
                        {
                            UseHeaderRow = true
                        }
                    };

                    var dataSet = reader.AsDataSet(conf);

                    // Now you can get data from each sheet by its index or its "name"
                    var dataTable = dataSet.Tables[0];

                    Console.WriteLine("Total no of rows  " + dataTable.Rows.Count);
                    Console.WriteLine("Total no of Columns  " + dataTable.Columns.Count);

                    return dataTable;

                }

        }
   
}
Kollwitz answered 23/10, 2020 at 12:35 Comment(1)
I don't know if this is correct or not, but since this is a new answer to a pretty old question with other highly voted answers, it would be useful to summarize what is different about your approach and why you wanted to share it.Digenesis

© 2022 - 2024 — McMap. All rights reserved.