Writing large number of records (bulk insert) to Access in .NET/C#
Asked Answered
T

8

55

What is the best way to perform bulk inserts into an MS Access database from .NET? Using ADO.NET, it is taking way over an hour to write out a large dataset.

Note that my original post, before I "refactored" it, had both the question and answer in the question part. I took Igor Turman's suggestion and re-wrote it in two parts - the question above and followed by my answer.

Tammitammie answered 15/8, 2011 at 19:55 Comment(6)
Very interesting, thanks for the post! Could you try to measure execution time using more precise Stopwatch() timer? Also I would suggest to remove Console.WriteLine, replace it by Debug.WriteLine. And one question why you measure time of TEMP table truncation? Interesting whether results would be the same.Dinh
Thanks for the suggestion of Stopwatch() and Debug. I learn something new each time I go onto this site. I measured the TEMP table truncation because in my "real-life" application I needed that as well. In every case, truncating the table (which had 100 000 rows) took less that 0.1 seconds. I had rerun this many times, and while the results varied, from say a low of 2.6 seconds for the fastest method to a high of 3 seconds, and for the slowest method from 84 to 89 seconds, the numbers I reported were typical.Tammitammie
What are you testing here? It seems to me that it's something one doesn't normally do, i.e., generate data on the fly and insert it row-by-row. Instead, you'd normally have a set of existing data that you want to insert, and you'd just use a SQL INSERT statement to do that. That DAO is faster with Jet/ACE data should be no surprise to anybody, as it was built around Jet from the beginning.Baldachin
@Marc. DAO's speed is not surprising to me. Long time ago I was dealing with processing thousands of records inside Access database from within VB6 program. The best performance I got was using DAO. Your findings are interesting and might be useful. But I think posting this as Question is not the right way to do. Stackoverflow allows sharing knowledge through Q/A. The way to do it is posting question and immediately posting the answer. This way, people will not assume you are looking for the answer and, at the same time, can suggest their options and/or vote for yoursUla
Hans: Tried the dbAppendOnly option. Ran it both ways 10 times, and for whatever reason there is variability in the results. Average time with the dbAppendOnly was 2.62 seconds, without it was 2.65 seconds. However, a T-Test showed no significant difference.Tammitammie
@MarcMeketon , Can you please have a look at this question and assisst with the code to perform the complex / bulk insert. There is funny behavior from Aceess - the same code is sometimes fast , slowa fter CompRepair , slow on different (even though smaller) data.Hunnicutt
T
83

I found that using DAO in a specific manner is roughly 30 times faster than using ADO.NET. I am sharing the code and results in this answer. As background, in the below, the test is to write out 100 000 records of a table with 20 columns.

A summary of the technique and times - from best to worse:

  1. 02.8 seconds: Use DAO, use DAO.Field's to refer to the table columns
  2. 02.8 seconds: Write out to a text file, use Automation to import the text into Access
  3. 11.0 seconds: Use DAO, use the column index to refer to the table columns.
  4. 17.0 seconds: Use DAO, refer to the column by name
  5. 79.0 seconds: Use ADO.NET, generate INSERT statements for each row
  6. 86.0 seconds: Use ADO.NET, use DataTable to an DataAdapter for "batch" insert

As background, occasionally I need to perform analysis of reasonably large amounts of data, and I find that Access is the best platform. The analysis involves many queries, and often a lot of VBA code.

For various reasons, I wanted to use C# instead of VBA. The typical way is to use OleDB to connect to Access. I used an OleDbDataReader to grab millions of records, and it worked quite well. But when outputting results to a table, it took a long, long time. Over an hour.

First, let's discuss the two typical ways to write records to Access from C#. Both ways involve OleDB and ADO.NET. The first is to generate INSERT statements one at time, and execute them, taking 79 seconds for the 100 000 records. The code is:

public static double TestADONET_Insert_TransferToAccess()
{
  StringBuilder names = new StringBuilder();
  for (int k = 0; k < 20; k++)
  {
    string fieldName = "Field" + (k + 1).ToString();
    if (k > 0)
    {
      names.Append(",");
    }
    names.Append(fieldName);
  }

  DateTime start = DateTime.Now;
  using (OleDbConnection conn = new OleDbConnection(Properties.Settings.Default.AccessDB))
  {
    conn.Open();
    OleDbCommand cmd = new OleDbCommand();
    cmd.Connection = conn;

    cmd.CommandText = "DELETE FROM TEMP";
    int numRowsDeleted = cmd.ExecuteNonQuery();
    Console.WriteLine("Deleted {0} rows from TEMP", numRowsDeleted);

    for (int i = 0; i < 100000; i++)
    {
      StringBuilder insertSQL = new StringBuilder("INSERT INTO TEMP (")
        .Append(names)
        .Append(") VALUES (");

      for (int k = 0; k < 19; k++)
      {
        insertSQL.Append(i + k).Append(",");
      }
      insertSQL.Append(i + 19).Append(")");
      cmd.CommandText = insertSQL.ToString();
      cmd.ExecuteNonQuery();
    }
    cmd.Dispose();
  }
  double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
  Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
  return elapsedTimeInSeconds;
}

Note that I found no method in Access that allows a bulk insert.

I had then thought that maybe using a data table with a data adapter would be prove useful. Especially since I thought that I could do batch inserts using the UpdateBatchSize property of a data adapter. However, apparently only SQL Server and Oracle support that, and Access does not. And it took the longest time of 86 seconds. The code I used was:

public static double TestADONET_DataTable_TransferToAccess()
{
  StringBuilder names = new StringBuilder();
  StringBuilder values = new StringBuilder();
  DataTable dt = new DataTable("TEMP");
  for (int k = 0; k < 20; k++)
  {
    string fieldName = "Field" + (k + 1).ToString();
    dt.Columns.Add(fieldName, typeof(int));
    if (k > 0)
    {
      names.Append(",");
      values.Append(",");
    }
    names.Append(fieldName);
    values.Append("@" + fieldName);
  }

  DateTime start = DateTime.Now;
  OleDbConnection conn = new OleDbConnection(Properties.Settings.Default.AccessDB);
  conn.Open();
  OleDbCommand cmd = new OleDbCommand();
  cmd.Connection = conn;

  cmd.CommandText = "DELETE FROM TEMP";
  int numRowsDeleted = cmd.ExecuteNonQuery();
  Console.WriteLine("Deleted {0} rows from TEMP", numRowsDeleted);

  OleDbDataAdapter da = new OleDbDataAdapter("SELECT * FROM TEMP", conn);

  da.InsertCommand = new OleDbCommand("INSERT INTO TEMP (" + names.ToString() + ") VALUES (" + values.ToString() + ")");
  for (int k = 0; k < 20; k++)
  {
    string fieldName = "Field" + (k + 1).ToString();
    da.InsertCommand.Parameters.Add("@" + fieldName, OleDbType.Integer, 4, fieldName);
  }
  da.InsertCommand.UpdatedRowSource = UpdateRowSource.None;
  da.InsertCommand.Connection = conn;
  //da.UpdateBatchSize = 0;

  for (int i = 0; i < 100000; i++)
  {
    DataRow dr = dt.NewRow();
    for (int k = 0; k < 20; k++)
    {
      dr["Field" + (k + 1).ToString()] = i + k;
    }
    dt.Rows.Add(dr);
  }
  da.Update(dt);
  conn.Close();

  double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
  Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
  return elapsedTimeInSeconds;
}

Then I tried non-standard ways. First, I wrote out to a text file, and then used Automation to import that in. This was fast - 2.8 seconds - and tied for first place. But I consider this fragile for a number of reasons: Outputing date fields is tricky. I had to format them specially (someDate.ToString("yyyy-MM-dd HH:mm")), and then set up a special "import specification" that codes in this format. The import specification also had to have the "quote" delimiter set right. In the example below, with only integer fields, there was no need for an import specification.

Text files are also fragile for "internationalization" where there is a use of comma's for decimal separators, different date formats, possible the use of unicode.

Notice that the first record contains the field names so that the column order isn't dependent on the table, and that we used Automation to do the actual import of the text file.

public static double TestTextTransferToAccess()
{
  StringBuilder names = new StringBuilder();
  for (int k = 0; k < 20; k++)
  {
    string fieldName = "Field" + (k + 1).ToString();
    if (k > 0)
    {
      names.Append(",");
    }
    names.Append(fieldName);
  }

  DateTime start = DateTime.Now;
  StreamWriter sw = new StreamWriter(Properties.Settings.Default.TEMPPathLocation);

  sw.WriteLine(names);
  for (int i = 0; i < 100000; i++)
  {
    for (int k = 0; k < 19; k++)
    {
      sw.Write(i + k);
      sw.Write(",");
    }
    sw.WriteLine(i + 19);
  }
  sw.Close();

  ACCESS.Application accApplication = new ACCESS.Application();
  string databaseName = Properties.Settings.Default.AccessDB
    .Split(new char[] { ';' }).First(s => s.StartsWith("Data Source=")).Substring(12);

  accApplication.OpenCurrentDatabase(databaseName, false, "");
  accApplication.DoCmd.RunSQL("DELETE FROM TEMP");
  accApplication.DoCmd.TransferText(TransferType: ACCESS.AcTextTransferType.acImportDelim,
  TableName: "TEMP",
  FileName: Properties.Settings.Default.TEMPPathLocation,
  HasFieldNames: true);
  accApplication.CloseCurrentDatabase();
  accApplication.Quit();
  accApplication = null;

  double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
  Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
  return elapsedTimeInSeconds;
}

Finally, I tried DAO. Lots of sites out there give huge warnings about using DAO. However, it turns out that it is simply the best way to interact between Access and .NET, especially when you need to write out large number of records. Also, it gives access to all the properties of a table. I read somewhere that it's easiest to program transactions using DAO instead of ADO.NET.

Notice that there are several lines of code that are commented. They will be explained soon.

public static double TestDAOTransferToAccess()
{

  string databaseName = Properties.Settings.Default.AccessDB
    .Split(new char[] { ';' }).First(s => s.StartsWith("Data Source=")).Substring(12);

  DateTime start = DateTime.Now;
  DAO.DBEngine dbEngine = new DAO.DBEngine();
  DAO.Database db = dbEngine.OpenDatabase(databaseName);

  db.Execute("DELETE FROM TEMP");

  DAO.Recordset rs = db.OpenRecordset("TEMP");

  DAO.Field[] myFields = new DAO.Field[20];
  for (int k = 0; k < 20; k++) myFields[k] = rs.Fields["Field" + (k + 1).ToString()];

  //dbEngine.BeginTrans();
  for (int i = 0; i < 100000; i++)
  {
    rs.AddNew();
    for (int k = 0; k < 20; k++)
    {
      //rs.Fields[k].Value = i + k;
      myFields[k].Value = i + k;
      //rs.Fields["Field" + (k + 1).ToString()].Value = i + k;
    }
    rs.Update();
    //if (0 == i % 5000)
    //{
      //dbEngine.CommitTrans();
      //dbEngine.BeginTrans();
    //}
  }
  //dbEngine.CommitTrans();
  rs.Close();
  db.Close();

  double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
  Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
  return elapsedTimeInSeconds;
}

In this code, we created DAO.Field variables for each column (myFields[k]) and then used them. It took 2.8 seconds. Alternatively, one could directly access those fields as found in the commented line rs.Fields["Field" + (k + 1).ToString()].Value = i + k; which increased the time to 17 seconds. Wrapping the code in a transaction (see the commented lines) dropped that to 14 seconds. Using an integer index rs.Fields[k].Value = i + k; droppped that to 11 seconds. Using the DAO.Field (myFields[k]) and a transaction actually took longer, increasing the time to 3.1 seconds.

Lastly, for completeness, all of this code was in a simple static class, and the using statements are:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ACCESS = Microsoft.Office.Interop.Access; // USED ONLY FOR THE TEXT FILE METHOD
using DAO = Microsoft.Office.Interop.Access.Dao; // USED ONLY FOR THE DAO METHOD
using System.Data; // USED ONLY FOR THE ADO.NET/DataTable METHOD
using System.Data.OleDb; // USED FOR BOTH ADO.NET METHODS
using System.IO;  // USED ONLY FOR THE TEXT FILE METHOD
Tammitammie answered 16/8, 2011 at 15:19 Comment(9)
.net really needs to add a "table direct" access method for db tables. bulk loading is far too painful, and shouldnt be.Oozy
Great answer, very helpful! For the DAO record set method, you can use the "Microsoft DAO 3.6 Object Library" directly (using DAO) instead of referencing the "Microsoft Office Library" (using DAO = Microsoft.Office.Interop.Access.Dao).Blazon
I know this is a very old post, but I wanted to say thank you. I was writing 400,000 records that took nearly 1.5 hours, and now it completes in minutes.Cysto
Thanks a lot, Marc, for this great comprehensive answer !! I am in the need to bulk transfer data from SQL to MS Access via C#, so your answer is of really great help. I didn't know that a direct access via DAO is so much faster then ADO! I had assumed that ADO.NET would translate everything to DAO anyway.Vilmavim
I've been unable to reproduce the speed of the DAO/field test on my local machines. Can you list the Access Database version and driver/DLL used?Sisneros
@Tom: I'm not sure - I have switched computers twice since then, and cannot locate my original test set. Most likely this was done with MS Access 2003.Tammitammie
In your last exemple, If you do rs.MoveNext() to the recordset, does myFields[i] now points to the next row as well?Menial
@Marc Meketon I try TestDAOTransferToAccess (). In row string databaseName = Properties.Settings.Default.AccessDB .Split(new char[] { ';' }).First(s => s.StartsWith("Data Source=")).Substring(12);. I get an error 'Settings" does not contain a definition for "AccessDB" and could not find the extension method "AccessDB", taking the type "Settings" as the first argument (it is possible that the using directive or the assembly link is missing) How to fix the error?Leo
I'm unable to see the huge speed increase on my machine. I'm comparing parameterized SQL inserts (using OleDbCommand) against the RecordSet, AddNew method described.Pulchritude
L
14

Thanks Marc, in order to vote you I created an account on StackOverFlow...

Below is the reusable method [Tested on C# with 64 Bit - Win 7, Windows 2008 R2, Vista, XP platforms]

Performance Details: Exports 120,000 Rows in 4 seconds.

Copy the below code and pass the parameters... and see the performance.

  • Just pass your datatable with the same schema, as of target Access Db Table.
  • DBPath= Full path of access Db
  • TableNm = Name of Target Access Db table.

The code:

public void BulkExportToAccess(DataTable dtOutData, String DBPath, String TableNm) 
{
    DAO.DBEngine dbEngine = new DAO.DBEngine();
    Boolean CheckFl = false;

    try
    {
        DAO.Database db = dbEngine.OpenDatabase(DBPath);
        DAO.Recordset AccesssRecordset = db.OpenRecordset(TableNm);
        DAO.Field[] AccesssFields = new DAO.Field[dtOutData.Columns.Count];

        //Loop on each row of dtOutData
        for (Int32 rowCounter = 0; rowCounter < dtOutData.Rows.Count; rowCounter++)
        {
            AccesssRecordset.AddNew();
            //Loop on column
            for (Int32 colCounter = 0; colCounter < dtOutData.Columns.Count; colCounter++)
            {
                // for the first time... setup the field name.
                if (!CheckFl)
                    AccesssFields[colCounter] = AccesssRecordset.Fields[dtOutData.Columns[colCounter].ColumnName];
                AccesssFields[colCounter].Value = dtOutData.Rows[rowCounter][colCounter];
            }

            AccesssRecordset.Update();
            CheckFl = true;
        }

        AccesssRecordset.Close();
        db.Close();
    }
    finally
    {
        System.Runtime.InteropServices.Marshal.ReleaseComObject(dbEngine);
        dbEngine = null;
    }
}
Lilla answered 22/8, 2012 at 19:23 Comment(2)
I've edited your code and removed the "catch-exception-throw-ex" lines. Try to never do that in C# unless you are aware of the consequences. Rethrowing an exception in this way removes the original stacktrace from it. To rethrow properly, use throw; instead of throw ex;. The catch clause was also empty, so except from damaging the stacktrace it did nothing at all.Olomouc
Thanks Prasoon. Seems to me there is a typo. Instead of DAO.Field[] AccesssFields = new DAO.Field[dtOutData.Rows.Count]; It should be DAO.Field[] AccesssFields = new DAO.Field[dtOutData.Columns.Count];Tammitammie
P
4

You can use a KORM, object relation mapper that allows bulk operations over MsAccess.

database
  .Query<Movie>()
  .AsDbSet()
  .BulkInsert(_data);

or if you have source reader, you can directly use MsAccessBulkInsert class:

using (var bulkInsert = new MsAccessBulkInsert("connection string"))
{
   bulkInsert.Insert(sourceReader);
}

KORM is available from nuget Kros.KORM.MsAccess and it's opensource on GitHub

Pitchman answered 29/4, 2018 at 11:53 Comment(0)
C
3

Thanks Marc for the examples.
On my system the performance of DAO is not as good as suggested here:

TestADONET_Insert_TransferToAccess(): 68 seconds
TestDAOTransferToAccess(): 29 seconds

Since on my system the use of Office interop libraries is not an option I tried a new method involving the writing of a CSV file and then importing it via ADO:

    public static double TestADONET_Insert_FromCsv()
    {
        StringBuilder names = new StringBuilder();
        for (int k = 0; k < 20; k++)
        {
            string fieldName = "Field" + (k + 1).ToString();
            if (k > 0)
            {
                names.Append(",");
            }
            names.Append(fieldName);
        }

        DateTime start = DateTime.Now;
        StreamWriter sw = new StreamWriter("tmpdata.csv");

        sw.WriteLine(names);
        for (int i = 0; i < 100000; i++)
        {
            for (int k = 0; k < 19; k++)
            {
                sw.Write(i + k);
                sw.Write(",");
            }
            sw.WriteLine(i + 19);
        }
        sw.Close();

        using (OleDbConnection conn = new OleDbConnection(Properties.Settings.Default.AccessDB))
        {
            conn.Open();
            OleDbCommand cmd = new OleDbCommand();
            cmd.Connection = conn;

            cmd.CommandText = "DELETE FROM TEMP";
            int numRowsDeleted = cmd.ExecuteNonQuery();
            Console.WriteLine("Deleted {0} rows from TEMP", numRowsDeleted);

            StringBuilder insertSQL = new StringBuilder("INSERT INTO TEMP (")
                .Append(names)
                .Append(") SELECT ")
                .Append(names)
                .Append(@" FROM [Text;Database=.;HDR=yes].[tmpdata.csv]");
            cmd.CommandText = insertSQL.ToString();
            cmd.ExecuteNonQuery();

            cmd.Dispose();
        }

        double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
        Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
        return elapsedTimeInSeconds;
    }

Performace analysis of TestADONET_Insert_FromCsv(): 1.9 seconds

Similar to Marc's example TestTextTransferToAccess(), this method is also fragile for a number of reasons regarding the use of CSV files.

Hope this helps.
Lorenzo

Comstockery answered 29/7, 2015 at 7:30 Comment(0)
G
2

First make sure that the access table columns have the same column names and similar types. Then you can use this function which I believe is very fast and elegant.

public void AccessBulkCopy(DataTable table)
{
    foreach (DataRow r in table.Rows)
        r.SetAdded();

    var myAdapter = new OleDbDataAdapter("SELECT * FROM " + table.TableName, _myAccessConn);

    var cbr = new OleDbCommandBuilder(myAdapter);
    cbr.QuotePrefix = "[";
    cbr.QuoteSuffix = "]";
    cbr.GetInsertCommand(true);

    myAdapter.Update(table);
}
Gutter answered 6/9, 2016 at 21:38 Comment(0)
M
1

To add to Marc's answer:

Note that having the [STAThread] attribute above your Main method. will make your program easily able to communicate with COM objects, increasing the speed further. I know it's not for every application but if you heavily depend on DAO, I would recommend it.

Further more, using the DAO insertion method. If you have a column that is not required and you want to insert null, don't even set it's value. Setting the value cost time even if it's null.

Mumble answered 21/12, 2018 at 8:37 Comment(0)
M
0

Another method to consider, involving linking tables via DAO or ADOX then executing statements like this:

SELECT * INTO Table1 FROM _LINKED_Table1

Please see my full answer here:
MS Access Batch Update via ADO.Net and COM Interoperability

Moreover answered 9/10, 2014 at 20:31 Comment(0)
S
-1

Note the position of the DAO component here. This helps explain the efficiency improvements.

Spile answered 7/12, 2017 at 15:55 Comment(1)
This would make a better comment than answer. That said, you're account can't comment until it has more reputation points. Have a look here: stackoverflow.com/help/how-to-answerThirza

© 2022 - 2024 — McMap. All rights reserved.