Performance of OLEDB to read Excel
Asked Answered
S

1

6

Following code takes like 2500 milliseconds on an i7-*3.4 GHz windows-7 64-bit computer to read an excel sheet with 25000 lines and 5 columns. Each cell approximately include a string with 10 characters. Is it normal? How can I read it faster?

 Stopwatch sw1 = Stopwatch.StartNew();
 var connectionString = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0}; " +
                                             "Extended Properties=Excel 12.0;", filename);

 var adapter = new OleDbDataAdapter("SELECT * FROM [roots$]", connectionString);
 var ds = new DataSet();
 adapter.Fill(ds, "roots");
 sw1.Stop(); Console.WriteLine("Time taken for excel roots: {0} ms", sw1.Elapsed.TotalMilliseconds);
Sternmost answered 3/7, 2012 at 14:5 Comment(3)
dataset are "heavy" objects, better create your own class and fill a list of it using datareaderLiterally
I think most of the performance cost is connection time (try modifying the recordset size to see if the elapsed time increases dramatically)Caseose
Try to move the start of the StopWatch to after the connection has been made and see how much time that part takes. But as Boomer already pointed out, try using OleDbCommand and OleDbDataReader instead of OleDbDataAdapter and a DataSet and you might also gain quite some speed.Overfeed
S
6

I wish to present my findings as an answer because the behavior is always consistent.

I have copied your code and put inside a button click event, just changed a bit to be sure to dispose the adapter and the connection for every test made.

// test.xls contains 26664 rows by 5 columns. Average 10 char for column, file size is 2448kb
// OS Windows 7 Ultimate 64 bit. CPU Intel Core2 Quad Q9550 2.83ghz 
// 8gb ram and disk C is an 256gb SSD cruzer

    private void button1_Click(object sender, EventArgs e)
    {

        string filename = "c:\\tmp\\test.xls";
        Stopwatch sw1 = Stopwatch.StartNew(); 
        var connectionString = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0}; " + 
                                              "Extended Properties=Excel 12.0", filename);

        using(var adapter = new OleDbDataAdapter("SELECT * FROM [roots$]", connectionString))
        {
            var ds = new DataSet();
            adapter.Fill(ds, "roots");
            sw1.Stop();
            Console.WriteLine("Time taken for excel roots: {0} ms", sw1.Elapsed.TotalMilliseconds);
        }
    }

So, this is basically is your code. This code executes in 500ms. BUT.... if I keep the file test.xls open in Excel 2010, the execute time jumps to 8000ms.

I have also tried this code variation, but the end results are the same

    private void button1_Click(object sender, EventArgs e)
    {
        string filename = "c:\\tmp\\test.xls";
        Stopwatch sw1 = Stopwatch.StartNew(); 
        var connectionString = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0}; " + 
                                              "Extended Properties=Excel 12.0", filename);
        using(OleDbConnection cn = new OleDbConnection(connectionString))
        {
            cn.Open();
            using(var adapter = new OleDbDataAdapter("SELECT * FROM [roots$]", cn))
            {
                var ds = new DataSet();
                adapter.Fill(ds, "roots");
                sw1.Stop();
                Console.WriteLine("Time taken for excel roots: {0} ms", sw1.Elapsed.TotalMilliseconds);
            }
        }
    }

and, no, it's not the Open() of the OleDbConnection, is always the adapter.Fill()

Signally answered 5/7, 2012 at 20:33 Comment(2)
I never thought the reason was file's being opened. Thank you for the answer.Sternmost
I noticed similar behavior with a project I worked on that reads large amounts of data from Excel. I didn't do the kind of research to narrow down the cause, because the project didn't really require any kind of performance (it's an internal tool that still finishes processing in under a minute). I noticed for sure though that if I had the spreadsheet open, it seemed to take up to a minute to complete where it might have taken 10 seconds with the spreadsheet not open. I thought it was strange, but this post kind of confirms I'm not crazy.Tijuanatike

© 2022 - 2024 — McMap. All rights reserved.