Reading Limited Rows from Excel File Uploaded on IIS
Asked Answered
P

5

9

I have an asp.net mvc app hosted on IIS. I have a form from where users upload excel files containing 50k+ rows. I read the excel file with the following C# code.

    public DataTable GetExcelDataTable(string fileName)
    {
        string connectionString = Path.GetExtension(fileName) == "xls" ?
            string.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data source={0}; Extended Properties=Excel 8.0;", fileName) :
            string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0}; Extended Properties=Excel 12.0;", fileName);

        var conn = new OleDbConnection(connectionString);
        using (var adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", conn))
        {
            var ds = new DataSet();

            adapter.Fill(ds);

            DataTable data = ds.Tables[0];

            conn.Close();
            conn.Dispose();
            adapter.Dispose();
            return data;
        }
    }

The problem is that it only reads up to 30k rows but never the entire excel file.

Interestingly enough, I can read (with the same code) all the rows if I run mvc app with visual studio but, again, never from IIS (IIS is also on my machine) hosted website.

Any ideas, why this happens?

Pipage answered 6/6, 2016 at 7:5 Comment(8)
any error you get after reading 30k records ?Samurai
no, non. I don't get any errors. does not matter if its iis or iisexpress (through vs)Pipage
It may help to see what your config files look like as your data may be exceeding the reader quotas.Takao
Yes, I have looked at it. But, I have the same configuration for IIS and IIS Express. It's working on IIS Express not on IIS.Pipage
Check this link The user faces the same problemQuarrier
Can you inspect the file itself? Set a breakpoint in this method, where string connectionString is populated, then navigate to the path indicated in the file name and confirm the number of rows in the file? I want to see if the issue is when the file is being saved, or when its being read (in this method).Parallax
How many columns does the excel file have? Maybe we're stumbling into a memory issue. I tried creating a web app that read an 18-column excel with 180k rows using VS2010 and MVC3, and my program slurped the excel file in one pass effortlessly.Balaklava
Does it happen to only one particular xls file? Would you please check the specified row?Loesceke
S
7

in this approach there is no need to install excel on target machine

    NPOI.SS.UserModel.IWorkbook hssfworkbook;

    bool InitializeWorkbook(string path)
    {
        try
        {
            if (path.ToLower().EndsWith(".xlsx"))
            {
                FileStream file1 = File.OpenRead(path);
                hssfworkbook = new XSSFWorkbook(file1);
            }
            else
            {
                //read the template via FileStream, it is suggested to use FileAccess.Read to prevent file lock.
                //book1.xls is an Excel-2007-generated file, so some new unknown BIFF records are added. 
                using (FileStream file = new FileStream(path, FileMode.Open, FileAccess.Read))
                {
                    hssfworkbook = new HSSFWorkbook(file);
                }
            }
            return true;
        }
        catch
        {
            return false;
        }
    }

In the following :

public DataTable GetExcelDataTable(NPOI.SS.UserModel.IWorkbook hssfworkbook, int rowCount)
    {
        NPOI.SS.UserModel.ISheet sheet = hssfworkbook.GetSheetAt(0);
        System.Collections.IEnumerator rows = sheet.GetRowEnumerator();

        DataTable dt = new DataTable();

        bool skipReadingHeaderRow = rows.MoveNext();
        if (skipReadingHeaderRow)
        {
            dynamic row;
            if (rows.Current is NPOI.HSSF.UserModel.HSSFRow)
                row = (NPOI.HSSF.UserModel.HSSFRow)rows.Current;
            else
                row = (NPOI.XSSF.UserModel.XSSFRow)rows.Current;
            for (int i = 0; i < row.LastCellNum; i++)
            {
                ICell cell = row.GetCell(i);
                if (cell != null)
                {
                    dt.Columns.Add(cell.ToString());
                }
                else
                {
                    dt.Columns.Add(string.Empty);
                }
            }
        }
        int cnt = 0;
        while (rows.MoveNext() && cnt < rowCount)
        {
            cnt++;
            dynamic row;
            if (rows.Current is NPOI.HSSF.UserModel.HSSFRow)
                row = (NPOI.HSSF.UserModel.HSSFRow)rows.Current;
            else
                row = (XSSFRow)rows.Current;

            DataRow dr = dt.NewRow();

            for (int i = 0; i < row.LastCellNum; i++)
            {
                ICell cell = row.GetCell(i);

                if (cell == null)
                {
                    dr[i - 1] = null;
                }
                else if (i > 0)
                {

                    dr[i - 1] = cell.ToString();
                }
            }
            dt.Rows.Add(dr);
        }

        return dt;
    }

or:

     public DataTable GetExcelDataTable(NPOI.SS.UserModel.IWorkbook hssfworkbook, int rowCount)
    {

        NPOI.SS.UserModel.ISheet sheet = hssfworkbook.GetSheetAt(0);
        System.Collections.IEnumerator rows = sheet.GetRowEnumerator();
        DataTable dt = new DataTable();
        bool skipReadingHeaderRow = rows.MoveNext();
        if (skipReadingHeaderRow)
        {
            dynamic row;
            if (rows.Current is NPOI.HSSF.UserModel.HSSFRow)
                row = (NPOI.HSSF.UserModel.HSSFRow)rows.Current;
            else
                row = (NPOI.XSSF.UserModel.XSSFRow)rows.Current;
            for (int i = 0; i < row.LastCellNum; i++)
            {
                ICell cell = row.GetCell(i);
                if (cell != null)
                {
                    dt.Columns.Add(cell.ToString());
                }
                else
                {
                    dt.Columns.Add(string.Empty);
                }
            }
        }
        int cnt = 0;
        while (rows.MoveNext() && cnt < rowCount)
        {
            cnt++;
            dynamic row;
            if (rows.Current is NPOI.HSSF.UserModel.HSSFRow)
                row = (HSSFRow)rows.Current;
            else
                row = (XSSFRow)rows.Current;
            DataRow dr = dt.NewRow();

            for (int i = 0; i < row.LastCellNum; i++)
            {
                ICell cell = row.GetCell(i);

                if (cell == null && i > 0)
                {
                    dr[i - 1] = null;
                }
                else if (i > 0)
                {
                    switch (cell.CellType)
                    {
                        case CellType.Blank:
                            dr[i - 1] = "[null]";
                            break;
                        case CellType.Boolean:
                            dr[i - 1] = cell.BooleanCellValue;
                            break;
                        case CellType.Numeric:
                            dr[i - 1] = cell.ToString();
                            break;
                        case CellType.String:
                            dr[i - 1] = cell.StringCellValue;
                            break;
                        case CellType.Error:
                            dr[i - 1] = cell.ErrorCellValue;
                            break;
                        case CellType.Formula:
                        default:
                            dr[i - 1] = "=" + cell.CellFormula;
                            break;
                    }

                }
            }
            dt.Rows.Add(dr);
        }

        return dt;
    }

or :

public DataTable GetExcelDataTable(NPOI.SS.UserModel.IWorkbook hssfworkbook, int segment, int rowCount)
    {
        NPOI.SS.UserModel.ISheet sheet = hssfworkbook.GetSheetAt(0);
        System.Collections.IEnumerator rows = sheet.GetRowEnumerator();

        DataTable dt = new DataTable();

        bool skipReadingHeaderRow = rows.MoveNext();
        if (skipReadingHeaderRow)
        {
            dynamic row;
            if (rows.Current is NPOI.HSSF.UserModel.HSSFRow)
                row = (NPOI.HSSF.UserModel.HSSFRow)rows.Current;
            else
                row = (NPOI.XSSF.UserModel.XSSFRow)rows.Current;
            for (int i = 0; i < row.LastCellNum; i++)
            {
                ICell cell = row.GetCell(i);
                if (cell != null)
                {
                    dt.Columns.Add(cell.ToString());
                }
                else
                {
                    dt.Columns.Add(string.Empty);
                }
            }
        }
        for (int i = 0; i < (segment - 1)*rowCount; i++)
        {
            if (!rows.MoveNext()) break;
        }

        int cnt = 0;
        while (rows.MoveNext() && cnt < rowCount)
        {
            cnt++;
            dynamic row;
            if (rows.Current is NPOI.HSSF.UserModel.HSSFRow)
                row = (NPOI.HSSF.UserModel.HSSFRow) rows.Current;
            else
                row = (NPOI.XSSF.UserModel.XSSFRow) rows.Current;

            DataRow dr = dt.NewRow();

            for (int i = 0; i < row.LastCellNum; i++)
            {
                ICell cell = row.GetCell(i);

                if (cell == null)
                {
                    dr[i - 1] = null;
                }
                else if (i > 0)
                {
                    switch (cell.CellType)
                    {
                        case CellType.Blank:
                            dr[i - 1] = "[null]";
                            break;
                        case CellType.Boolean:
                            dr[i - 1] = cell.BooleanCellValue;
                            break;
                        case CellType.Numeric:
                            dr[i - 1] = cell.ToString();
                            break;
                        case CellType.String:
                            dr[i - 1] = cell.StringCellValue;
                            break;
                        case CellType.Error:
                            dr[i - 1] = cell.ErrorCellValue;
                            break;
                        case CellType.Formula:
                        default:
                            dr[i - 1] = "=" + cell.CellFormula;
                            break;
                    }
                }
            }
            dt.Rows.Add(dr);
        }
        return dt;
    }
Salinasalinas answered 28/6, 2016 at 11:44 Comment(2)
Actually, if you want to use OleDB to read excel files you only need to install Microsoft JET data drivers which are free.Balaklava
@MaxLambertini There exists so many solutions for every problem, so it depends on us to choose one depending on different circumstancesSalinasalinas
F
5

Are you able to post some specs of your server? Is it VM and cloud based by any chance? In the past I have been successful using:

  1. Koogra : https://sourceforge.net/projects/koogra/

  2. NPOI : http://npoi.codeplex.com/

    To read .xls files, but if you can limit your files to .xlsx I would use ClosedXML. I have read plenty of huge files 50K+ with ClosedXML on a beefy VM in Azure without issue. I have a feeling that you are hitting a userspace wall on the server. If a user is hitting such a percentage they reach their usage quota and its ending the task.

Florindaflorine answered 23/6, 2016 at 13:36 Comment(2)
but, does the server specs matter? since it's working on VS executed IISExpress on localhost but not on just IIS (also localhost)Pipage
Sure, IIS on Windows Server 2003+ has a habit of block users when they start maxing out resources. My initial thought was that it was hitting the 30k mark and dropping the connection due to hitting some sort of threshold. Just a hunch really.Florindaflorine
A
4

This issue can be solved by reading the data in two parts like 25K + 25k = 50K. You just need to update select query as:

SELECT TOP 25000 * FROM [Sheet1$]
Albion answered 24/6, 2016 at 7:21 Comment(3)
i am not sure but something like rowid kind of thing you have to useLecroy
Really not a bad way of doing it, you could even start two threads one reading from the top, the other the bottom and finish it much faster. A simple for loop with the max amount of rows /2 would do it .Florindaflorine
@Pipage I believe select bottom 25000 or something similar should exist.Mulderig
T
3

I've created small sample on my side, making the use of SELECT TOP together with ORDER BY then you can get the results:

Check the code:

  public DataSet GetExcelDataTable(string fileName)
    {
        string connectionString = Path.GetExtension(fileName) == "xls" ?
            string.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data source={0}; Extended Properties=Excel 8.0;", fileName) :
            string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0}; Extended Properties=Excel 12.0;", fileName);

        var conn = new OleDbConnection(connectionString);

        DataTable data = new DataTable();
        DataTable data2 = new DataTable();
        var ds = new DataSet();

        using (var adapter = new OleDbDataAdapter("SELECT TOP 25000 Name, Surname FROM [Sheet1$] ORDER BY Name asc", conn))
        {
            adapter.Fill(data);
        }

        using (var adapter = new OleDbDataAdapter("SELECT TOP 25000 Name, Surname FROM [Sheet1$] ORDER BY Name desc", conn))
        {
            adapter.Fill(data2);
        }

        if (data.Rows.Count > 0)ds.Tables.Add(data);
        if (data2.Rows.Count > 0) ds.Tables.Add(data2);

        return ds;
    }
Tania answered 24/6, 2016 at 12:6 Comment(0)
B
3

You can try to fill your datatable by reading chunk of rows after chunk instead of doing one single read.

The beauty of this approach is that you're not limited to 50k record, but it adapts to the actual cardinality of your datatable.

This code works on my machine (Win10 X64, VS2010 Express):

    public DataTable GetExcelDataTable(string fileName)
    {
        string connectionString = Path.GetExtension(fileName) == "xls" ?
            string.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data source={0}; Extended Properties=Excel 8.0;", fileName) :
            string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0}; Extended Properties=Excel 12.0;", fileName);

        var conn = new OleDbConnection(connectionString);
        using (var adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", conn))
        {
            var dt = new DataTable();

            int recordRead = 0;
            int recordCur = 0;     //starting point
            int recordStep = 6789; //records to read

            //here, we read **recordStep** records instead of reading 
            //all excel data
            do
            {
                recordRead = adapter.Fill( recordCur, recordStep, dt);
                recordCur += recordRead; //increment starting point accordingly
            } while (recordRead > 0);


            conn.Close();
            conn.Dispose();
            adapter.Dispose();
            return dt;
        }
    }
Balaklava answered 28/6, 2016 at 12:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.