Delete Empty Rows with Excel Interop
Asked Answered
M

8

14

I have user supplied excel files that need to be converted to PDF. Using excel interop, I can do this fine with .ExportAsFixedFormat(). My problem comes up when a workbook has millions of rows. This turns into a file that has 50k+ pages. That would be fine if the workbook had content in all of those rows. Every time one of these files shows up though, there are maybe 50 rows that have content and the rest are blank. How can I go about removing the empty rows so I can export it to a decent sized PDF?

  1. I've tried starting at the end row and, one-by-one, using CountA to check if the row has content and if it does, delete it. Not only does this take forever, this seems to fail after about 100k rows with the following error:

    Unable to evaluate expression because the code is optimized or a native frame is on top of the call stack.

  2. I've tried using SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues) but that includes a row if any cell has formatting (like a bg color).

  3. I've tried using Worksheet.UsedRange and then deleting everything after that but UsedRange has the same problem as point two.


This is the code I've tried:
for (int i = 0; i < worksheets.Count; i++)
{
    sheet = worksheets[i + 1];
    rows = sheet.Rows;
    currentRowIndex = rows.Count;
    bool contentFound = false;

    while (!contentFound && currentRowIndex > 0)
    {
        currentRow = rows[currentRowIndex];

        if (Application.WorksheetFunction.CountA(currentRow) == 0)
        {
            currentRow.Delete();
        }
        else
        {
            contentFound = true;
        }

        Marshal.FinalReleaseComObject(currentRow);
        currentRowIndex--;
    }

    Marshal.FinalReleaseComObject(rows);
    Marshal.FinalReleaseComObject(sheet);
}

for (int i = 0; i < worksheets.Count; i++)
{
    sheet = worksheets[i + 1];
    rows = sheet.Rows;

    lastCell = rows.SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues);
    int startRow = lastCell.Row;

    Range range = sheet.get_Range(lastCell.get_Address(RowAbsolute: startRow));
    range.Delete();

    Marshal.FinalReleaseComObject(range);
    Marshal.FinalReleaseComObject(lastCell);
    Marshal.FinalReleaseComObject(rows);
    Marshal.FinalReleaseComObject(sheet);
}

Do I have a problem with my code, is this an interop problem or maybe it's just a limitation on what Excel can do? Is there a better way to do what I'm attempting?

Manhunt answered 21/3, 2011 at 20:53 Comment(4)
I'd really like to look into this topic. Do you have a demo file to make tests with?Idola
@PilgerstorferFranz Sorry, I don't. This project is long gone.Manhunt
Die you find any solution?Idola
I didn't. There ended up being a user education snippet that told users to screen the workbooks before conversion. :(Manhunt
H
1

I would suggest you to get the count of rows which contain some values, using CountA (as you have tried in point 1). Then copy those rows into a new sheet and export it from there. It will be easier to copy few rows to new sheet and working on it, rather than trying to delete huge number of rows from source sheet.

For creating new sheet and copying rows you can use the following code:

        excel.Worksheet tempSheet = workbook.Worksheets.Add();
        tempSheet.Name = sheetName;
        workbook.Save();

//create a new method for copy new rows

//as the rowindex you can pass the total no of rows you have found out using CountA

public void CopyRows(excel.Workbook workbook, string sourceSheetName, string DestSheetName, int rowIndex)
        {
            excel.Worksheet sourceSheet = (excel.Worksheet)workbook.Sheets[sourceSheetName];
            excel.Range source = (excel.Range)sourceSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;

            excel.Worksheet destSheet = (excel.Worksheet)workbook.Sheets[DestSheetName];
            excel.Range dest = (excel.Range)destSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;
            source.Copy(dest);

            excel.Range newRow = (excel.Range)destSheet.Rows[rowIndex+1];
            newRow.Insert();
            workbook.Save();
        }
Hoarse answered 14/12, 2017 at 10:1 Comment(0)
C
0

Have you tried Sheet1.Range("A1").CurrentRegion.ExportAsFixedFormat() where Sheet1 is a valid sheet name and "A1" is a cell you can test to ensure it is located in the range you want to export?

The question remains, why does Excel think there is data in those "empty" cells? Formatting? A pre-existing print area that needs to be cleared? I know I've encountered situations like that before, those are the only possibilities that come to mind at this moment.

Create answered 21/3, 2011 at 22:2 Comment(1)
Darn, this doesn't work either. I have the same issue as my points two and three. It would be great if I could just tell users to not make ridiculous spreadsheets. :DManhunt
W
0

Try these steps -

  1. copy Worksheet.UsedRange to a separate sheet (sheet2).
  2. use paste special so that formatting is retained
  3. try parsing sheet2 for unused rows

If this doesnt help try repeating step 2 with formatting info being cleared and then parsing sheet2. you can always copy format info later (if they are simple enough)

Wheel answered 22/3, 2011 at 18:46 Comment(1)
I tried the first part of what you suggested. Same problem as points two and three. I didn't try copying without formatting and then re-applying formatting though. How would one do that? if they are simple enough - does that mean copying the formatting won't always be a viable option? Since these are user supplied sheets, I can't be guaranteed what formatting they will have.Manhunt
R
0

If you can first load the Excel file into a DataSet via the OleDBAdapter, it's relatively easy to remove blank rows on the import... Try this OleDBAdapter Excel QA I posted via stack overflow.

Then export the DataSet to a new Excel file and convert that file to PDF. That may be a big "IF" though of course depending on the excel layout (or lack there of).

Rockery answered 5/5, 2011 at 16:19 Comment(2)
I'm not using a DataSet. I need to modify the actual excel file and it looks like ADO.NET doesn't support the delete operationManhunt
Ahh, I should not have assumed you were using the oledbadapter and DataSet. I'll modify my Answer.Rockery
B
0

I had to solve this problem today for what might be a subset of your possible cases.

If your spreadsheet meets the following conditions:

  1. All columns with data have header text in line 1.
  2. All rows with data are in sequence until the first BLANK row.

Then, the following code may help:

    private static string[,] LoadCellData(Excel.Application excel, dynamic sheet)
    {
        int countCols = CountColsToFirstBlank(excel, sheet);
        int countRows = CountRowsToFirstBlank(excel, sheet);
        cellData = new string[countCols, countRows];
        string datum;

        for (int i = 0; i < countCols; i++)
        {
            for (int j = 0; j < countRows; j++)
            {
                try
                {
                    if (null != sheet.Cells[i + 1, j + 1].Value)
                    {
                        datum = excel.Cells[i + 1, j + 1].Value.ToString();
                        cellData[i, j] = datum;
                    }
                }
                catch (Exception ex)
                {
                    lastException = ex;
                    //Console.WriteLine(String.Format("LoadCellData [{1}, {2}] reported an error: [{0}]", ex.Message, i, j));
                }
            }
        }

        return cellData;
    }

    private static int CountRowsToFirstBlank(Excel.Application excel, dynamic sheet)
    {
        int count = 0;

        for (int j = 0; j < sheet.UsedRange.Rows.Count; j++)
        {
            if (IsBlankRow(excel, sheet, j + 1))
                break;

            count++;
        }
        return count;
    }
    private static int CountColsToFirstBlank(Excel.Application excel, dynamic sheet)
    {
        int count = 0;

        for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
        {
            if (IsBlankCol(excel, sheet, i + 1))
                break;

            count++;
        }
        return count;
    }

    private static bool IsBlankCol(Excel.Application excel, dynamic sheet, int col)
    {
        for (int i = 0; i < sheet.UsedRange.Rows.Count; i++)
        {
            if (null != sheet.Cells[i + 1, col].Value)
            {
                return false;
            }
        }

        return true;
    }
    private static bool IsBlankRow(Excel.Application excel, dynamic sheet, int row)
    {
        for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
        {
            if (null != sheet.Cells[i + 1, row].Value)
            {
                return false;
            }
        }

        return true;
    }
Bouillon answered 21/5, 2012 at 18:27 Comment(1)
I don't believe this is a workable solution for the issue, since (as noted in the question) empty cells that have formatting, should not be deleted. Unless I'm mistaken, your snippet will delete those rows erroneously, as the values would be null while the formatting may be intended for keeping.Indaba
A
0

Can you try with below code :

for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)
                    {
                        //Assume the first row is the header. Then use the column match ups by name to determine the index.
                        //This will allow you to have the order of the header.Keys change without any affect.
                        var row = workSheet.Cells[string.Format("{0}:{0}", rowIndex)];
                        // check if the row and column cells are empty
                        bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
                        if (allEmpty)
                            continue; // skip this row
                        else{
                               //here read header
                               if()
                                 {
                                  //some code
                                 }
                               else
                                  {
                                   //some code to read body
                                  }
                            }
                    }

Hope this help,else let me know if you need description about code.

Updated :

  • below code is used to check how many rows are in the worksheet. a for loop will traverse untill end of row of the worksheet.

for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)

  • here we are checking if the row and column cells are empty using linq:

bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
if (allEmpty)
continue; // if true then skip this row
else // read headers(assuming it is presented in worksheet)
// else read row wise data and then do necessary steps.

hoping this clears now.

Acrostic answered 19/3, 2018 at 9:50 Comment(0)
P
0

I had the same problem and managed to fix it using the CurrentRegion:

                    var lastcell = sheet.Cells.SpecialCells(XlCellType.xlCellTypeLastCell);
                    var filledcells = sheet.Cells.Range[sheet.Cells.Item[1, 1],
                            sheet.Cells[lastcell.Row - 1, lastcell.Column]]
                        .CurrentRegion;
                    filledcells.ExportAsFixedFormat(

and so on. The CurrentRegion is said to expand to the borders where cells are empty, and apparently that means it also shrinks if it contains many empty cells.

Pupillary answered 12/9, 2020 at 7:3 Comment(0)
G
-1

Please try the following code:

for (int i = 0; i < worksheets.Count; i++)
{
    sheet = worksheets[i + 1];
    sheet.Columns("A:A").SpecialCells(XlCellType.xlCellTypeBlanks).EntireRow.Delete
    sheet.Rows("1:1").SpecialCells(XlCellType.xlCellTypeBlanks).EntireColumn.Delete
    Marshal.FinalReleaseComObject(sheet);
}
Gregale answered 22/11, 2016 at 7:7 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.