How to increase performance for bulk INSERTs to ODBC linked tables in Access?
Asked Answered
S

4

8

I have CSV and TXT files to import. I am importing the files into Access and then inserting the records into a linked Oracle table. Each file has around 3 million rows and the process is taking a long time to complete.

Importing into Access is very fast, but inserting into the linked Oracle table is taking an extremely long time.

Here is the process I am currently using:

DoCmd.TransferText acImportFixed, "BUSSEP2014 Link Specification", "tblTempSmartSSP", strFName, False
db.Execute "INSERT INTO METER_DATA ([MPO_REFERENCE]) SELECT MPO_REFERENCE FROM tblTempSmartSSP;"`

tblTempSmartSSP is an Access Table and METER_DATA is a linked Oracle table

I also tried direct import to linked table and that was also very slow.

How can I speed up the process?

Shackelford answered 16/9, 2014 at 7:57 Comment(1)
Meter_Data is a oracle database tableShackelford
M
12

This situation is not uncommon when dealing with bulk INSERTs to ODBC linked tables in Access. In the case of the following Access query

INSERT INTO METER_DATA (MPO_REFERENCE) 
SELECT MPO_REFERENCE FROM tblTempSmartSSP

where [METER_DATA] is an ODBC linked table and [tblTempSmartSSP] is a local (native) Access table, the Access Database Engine is somewhat limited in how clever it can be with ODBC linked tables because it has to be able to accommodate a wide range of target databases whose capabilities may vary greatly. Unfortunately, it often means that despite the single Access SQL statement what actually gets sent to the remote (linked) database is a separate INSERT (or equivalent) for each row in the local table. Understandably, that can prove to be very slow if the local table contains a large number of rows.

Option 1: Native bulk inserts to the remote database

All databases have one or more native mechanisms for the bulk loading of data: Microsoft SQL Server has "bcp" and BULK INSERT, and Oracle has "SQL*Loader". These mechanisms are optimized for bulk operations and will usually offer significant speed advantages. In fact, if the data needs to be imported into Access and "massaged" before being transferred to the remote database it can still be faster to dump the modified data back out to a text file and then bulk import it into the remote database.

Option 2(a): Using Python and pandas

pyodbc with fast_executemany=True can upload rows much faster than INSERT INTO … SELECT … on a linked table. See this answer for details.

Option 2(b): Using a pass-through query in Access

If the bulk import mechanisms are not a feasible option, then another possibility is to build one or more pass-through queries in Access to upload the data using INSERT statements that can insert more than one row at a time.

For example, if the remote database was SQL Server (2008 or later) then we could run an Access pass-through (T-SQL) query like this

INSERT INTO METER_DATA (MPO_REFERENCE) VALUES (1), (2), (3)

to insert three rows with one INSERT statement.

According to an answer to another earlier question here the corresponding syntax for Oracle would be

INSERT ALL
    INTO METER_DATA (MPO_REFERENCE) VALUES (1)
    INTO METER_DATA (MPO_REFERENCE) VALUES (2)
    INTO METER_DATA (MPO_REFERENCE) VALUES (3)
SELECT * FROM DUAL;

I tested this approach with SQL Server (as I don't have access to an Oracle database) using a native [tblTempSmartSSP] table with 10,000 rows. The code ...

Sub LinkedTableTest()
    Dim cdb As DAO.Database
    Dim t0 As Single
    
    t0 = Timer
    Set cdb = CurrentDb
    cdb.Execute _
            "INSERT INTO METER_DATA (MPO_REFERENCE) " & _
            "SELECT MPO_REFERENCE FROM tblTempSmartSSP", _
            dbFailOnError
    Set cdb = Nothing
    Debug.Print "Elapsed time " & Format(Timer - t0, "0.0") & " seconds."
End Sub

... took approximately 100 seconds to execute in my test environment.

By contrast the following code, which builds multi-row INSERTs as described above (using what Microsoft calls a Table Value Constructor) ...

Sub PtqTest()
    Dim cdb As DAO.Database, rst As DAO.Recordset
    Dim t0 As Single, i As Long, valueList As String, separator As String

    t0 = Timer
    Set cdb = CurrentDb
    Set rst = cdb.OpenRecordset("SELECT MPO_REFERENCE FROM tblTempSmartSSP", dbOpenSnapshot)
    i = 0
    valueList = ""
    separator = ""
    Do Until rst.EOF
        i = i + 1
        valueList = valueList & separator & "(" & rst!MPO_REFERENCE & ")"
        If i = 1 Then
            separator = ","
        End If
        If i = 1000 Then
            SendInsert valueList
            i = 0
            valueList = ""
            separator = ""
        End If
        rst.MoveNext
    Loop
    If i > 0 Then
        SendInsert valueList
    End If
    rst.Close
    Set rst = Nothing
    Set cdb = Nothing
    Debug.Print "Elapsed time " & Format(Timer - t0, "0.0") & " seconds."
End Sub

Sub SendInsert(valueList As String)
    Dim cdb As DAO.Database, qdf As DAO.QueryDef
    
    Set cdb = CurrentDb
    Set qdf = cdb.CreateQueryDef("")
    qdf.Connect = cdb.TableDefs("METER_DATA").Connect
    qdf.ReturnsRecords = False
    qdf.sql = "INSERT INTO METER_DATA (MPO_REFERENCE) VALUES " & valueList
    qdf.Execute dbFailOnError
    Set qdf = Nothing
    Set cdb = Nothing
End Sub

... took between 1 and 2 seconds to produce the same results.

(T-SQL Table Value Constructors are limited to inserting 1000 rows at a time, so the above code is a bit more complicated than it would be otherwise.)

Misspell answered 16/9, 2014 at 22:14 Comment(5)
Very impressive Gord. As a general note, people often assume that using sql insert statements are faster then dao.recordsets - they are not UNLESS the one SQL statement can replace a recorsetloop. However executing a "whole" SQL string for each row tends to be much slower then a recorset. I would thus suggest the poster just try a simple recordset on the linked table.Thermotaxis
Hi Gord Thompson, thanks for the code. I am getting one error while running this code "ODBC- Call Failed" at qdf.Execute dbFailOnErrorShackelford
@ravichaudhary Please ask a new question showing your actual code.Misspell
NB: Please note that the example above doesn't work on Oracle RDMS. Oracle will not allow the insert statement to be: INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9); Instead you must use the INSERT ALL statement and then write INTO for each line of the content of the valueList, i.e. INTO tbl_name (a,b,c) VALUES(1,2,3) at first line, and then INTO tbl_name (a,b,c) VALUES(4,5,6) on the second line and so forth. See techonthenet.com/oracle/questions/insert_rows.php And that detail ruined the performance advantage for my solution :(Matins
@Matins Thanks for your comment. In the answer I did state that the test code was for (Microsoft) SQL Server. It also shows the corresponding INSERT ALL syntax for Oracle and I would have tested against an Oracle server if one had been available to me. The Oracle syntax is certainly more verbose than the SQL Server syntax, which would mitigate any performance benefit somewhat, but are you saying that using this type of approach against an Oracle database offers no significant benefit over a plain INSERT INTO with an (Oracle) linked table?Misspell
M
1

Another possibility for SQL Server if bcp or BULK COPY is not an option:

  • METER_DATA is an ODBC linked table in Access for dbo.METER_DATA on SQL Server
  • tblTempSmartSSP is a local Access table containing 10_000 rows

On my (very old) test network, this VBA code takes ~240 seconds (about 4 minutes) to run

Sub upload()
    Dim cdb As DAO.Database
    Set cdb = CurrentDb
    Dim t0 As Single
    t0 = Timer
    cdb.Execute "INSERT INTO METER_DATA (MPO_REFERENCE) SELECT MPO_REFERENCE FROM tblTempSmartSSP"
    Debug.Print Timer - t0
End Sub

Using Python and pandas, this code uploads the same 10_000 rows in about 5 seconds

from time import perf_counter

import pandas as pd
import sqlalchemy as sa

acc_engine = sa.create_engine(
    "access+pyodbc://@meter_data"
)
sql_engine = sa.create_engine(
    "mssql+pyodbc://scott:tiger^5HHH@mssql_199",
    fast_executemany=True,
)

t0 = perf_counter()
df = pd.read_sql_query("SELECT MPO_REFERENCE FROM tblTempSmartSSP", acc_engine)
df.to_sql("METER_DATA", sql_engine, schema="dbo", if_exists="append", index=False)
print(perf_counter() - t0)

Requirements:

Note that other databases supported by SQLAlchemy (e.g., PostgreSQL) may also offer significant performance improvements over an ODBC linked table in Access.

Misspell answered 31/7, 2022 at 21:56 Comment(0)
L
0

Do you have to import all the columns? Maybe you want to leave out empty columns, if any; and also columns which are not absolutely necessary for business purposes

Loganiaceous answered 16/9, 2014 at 12:58 Comment(4)
I need all the columns and at max there are 5 columns only.. I am also trying to import small data like 6000 rows, it is also taking more than an hourShackelford
Just thought of this; try using the following in order to run your action query. Maybe it's slow because you need more declarations:Loganiaceous
Are you importing the file into an existing table? and, are you the ony user? Creating a table takes more time, and you don't want anyone else in the db if you're importing. Also, run a query to delete nulls before doing anything elsefrom the column you're copying over. Nulls slow things down. Just a few ideasLoganiaceous
I am importing into the existing table and i am using db only, to delete null i am already running query, it is taking time when i am trying to insert from ms access table to oracle database table i.e Meter_dataShackelford
L
0

Sorry, I forgot to include the code:

Option Compare Database
Option Explicit

Public Function Run_Safe_SQL(strSQL)
On Error GoTo Error_Handler
Dim db As DAO.Database

   Set db = CurrentDb()
   db.Execute strSQL, dbFailOnError
   DBEngine.Idle dbRefreshCache
'   DoEvents

Exit_Here:
   'Cleanup
   Set db = Nothing
   strSQL = ""
   Exit Function

Error_Handler:
    MsgBox Err.Description & " " & Err.Number

End Function
Loganiaceous answered 16/9, 2014 at 13:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.