How to remove empty cells and rows from file using ExcelDataReader?
Asked Answered
C

3

7

I'm using ExcelDataReader to read excel file but the file which being processed has 16k columns and 1200 records each. In between only 52 columns are filled other are empty so I want to remove empty columns with rows.

tried with :

var dataset = reader.AsDataSet(new ExcelDataSetConfiguration() {
    ConfigureDataTable = _ => new ExcelDataTableConfiguration() {
        FilterRow = rowReader => rowReader.Depth != 0
    }
});

help will be appreciated thank you

Cantlon answered 5/6, 2018 at 11:54 Comment(0)
R
8

AFAIK, There is not any simple way to filter empty columns or rows for it my best try is:

// use ExcelDataTableConfiguration.FilterRow to filter empty rows
FilterRow = rowReader =>
{
    var hasData = false;
    for (var i = 0; i < rowReader.FieldCount; i++)
    {
        if (rowReader[i] == null || string.IsNullOrEmpty(rowReader[i].ToString()))
        {
            continue;
        }

        hasData = true;
        break;
    }

    return hasData;
},
// use ExcelDataTableConfiguration.FilterColumn to filter empty columns
FilterColumn = (rowReader, colIndex) =>
{
    var hasData = false;
    rowReader.Reset();

    // this will skip first row as it is name of column
    rowReader.Read();

    while (rowReader.Read())
    {
        if (rowReader[colIndex] == null || 
            string.IsNullOrEmpty(rowReader[colIndex].ToString()))
        {
            continue;
        }

        hasData = true;
        break;
    }

    // below codes do a trick!
    rowReader.Reset();
    rowReader.Read();

    return hasData;
}

Using IExcelDataReader arguments inside FilterRow or FilterColumn is evil!
I think in main reader and each rowReader are referenced to one object! so you should always aware of how to use them, as you can see I add a trick to make reader to be ready after using it. - HTH ;)

Rhee answered 20/8, 2018 at 6:38 Comment(0)
U
0

With a little inspiration from ShA.t's answer, I was able to make header rows and columns read exactly where they start in the excel document. You only need to filter the empty columns, as long as you know where the rows start. Here's a sample code:

public class TrimmedTableSample 
{
    #region Properties
    int HeaderRowIndex { get; set; }
    #endregion

    #region Methods
    public void Read(string documentPath)
    {
        using (var stream = File.Open(documentPath, FileMode.Open, FileAccess.Read))
        using (var reader = ExcelReaderFactory.CreateReader(stream))
        {
            var dataSet = reader.AsDataSet(new ExcelDataSetConfiguration()
            {
                UseColumnDataType = true,
                ConfigureDataTable = (tableReader) => new ExcelDataTableConfiguration()
                {
                    EmptyColumnNamePrefix = "Column ",
                    UseHeaderRow = true,
                    ReadHeaderRow = (reader) =>
                    {
                        bool empty = true;
                        HeaderRowIndex = 0;

                        while (empty)
                        {
                            for (var i = 0; i < reader.FieldCount && empty; i++)
                                empty = string.IsNullOrWhiteSpace(reader.GetString(i));

                            if (empty)
                            {
                                empty = reader.Read(); // Only continue if more content is available
                                HeaderRowIndex++; // Keep track of the first row position.
                            }
                        }
                    },
                    FilterColumn = (reader, index) =>
                    {
                        bool empty = false;
                        string sheet = reader.Name;

                        // Start reading the table from the beginning
                        reader.Reset();

                        // EDIT: Head over the our current excel sheet
                        while (reader.Name != sheet)
                        if (!reader.NextResult())
                            break;


                        // Head to the first row with content
                        int rowIndex = 0;
                        while (rowIndex < HeaderRowIndex)
                        {
                            reader.Read();
                            rowIndex++;
                        }

                        while (reader.Read())
                        {
                            // Decide if the current column is empty
                            if (reader[index] == null || string.IsNullOrEmpty(reader[index].ToString()))
                                continue;

                            empty = true;
                            break;
                        }

                        // Start over again (This allows the reader to automatically read the rest of the content itself)
                        reader.Reset();

                        // EDIT: Head over the our current excel sheet
                        while (reader.Name != sheet)
                        if (!reader.NextResult())
                            break;

                        reader.Read();

                        // Head over to the first row with content
                        rowIndex = 0;
                        while (rowIndex < HeaderRowIndex)
                        {
                            reader.Read();
                            rowIndex++;
                        }

                        // Return info on whether this column should be ignored or not.
                        return empty;
                    }
                }
            });  
        }
    }
    #endregion
}

Good Luck!

Ulrika answered 15/2, 2020 at 21:11 Comment(0)
L
0

I use this solution that works by calling DataRow.Delete() method:

                    foreach (var row in rows)
                    {
                        var emptyColumnsCount = 0;
                        for (var i = 0; i < columns.Count; i++)
                        {
                            if (!row.IsNull(columns[i]))
                            {
                               // Do some logic to read row.
                            }
                            else
                            {
                                emptColumnsCount++;
                            }
                        }
                        if (emptyColumnsCount == columns.Count)
                            row.Delete();
                        else
                           // Do some logic to use row
                    }
Lidialidice answered 25/8, 2021 at 4:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.