ExecuteNonQuery in parallel within a shared OleDbConnection / OleDbTransaction
Asked Answered
W

4

6

Discovered that OleDBConnection doesn't seem to be ThreadSafe. It appears that it attempts to open multiple connections instead.

//doesn't work
using (OleDbConnection oConn = TheDataAccessLayer.GetConnection())
using (OleDbTransaction oTran = oConn.BeginTransaction())
Parallel.ForEach(ORMObjects, (ORMObject, State) =>
{

        if (!State.ShouldExitCurrentIteration && !State.IsExceptional)
        {
              var Error = ORMObject.SomethingThatExecutesANonQuery(oConn,oTran)

              if (Error.Number != 0)
                  State.Stop();

        }

});

If I lock the connection for an ExecuteNonQuery the errors go away, but the performance tanks.

 //works
    using (OleDbConnection oConn =  TheDataAccessLayer.GetConnection())
    using (OleDbTransaction oTran = oConn.BeginTransaction())
    Parallel.ForEach(ORMObjects, (ORMObject, State) =>
    {

            if (!State.ShouldExitCurrentIteration && !State.IsExceptional)
            {
              lock(oConn)
              {
                    var Error = ORMObject.SomethingThatExecutesANonQuery(oConn,oTran)

                if (Error.Number != 0)
                      State.Stop();
             }

            }

    });

Assume that

  • I can't change the nature of the ORM: the SQL cannot be bulked

  • Business rules require that the interaction be performed within a single transaction

So:

  • Is there a more better/more efficient way to parallelize OleDb interactions?

  • If not, is there an alternative to the OleDb client that can take full advantage of parallelism? (Maybe the native MSSQL client?)

Whimsey answered 9/10, 2011 at 18:41 Comment(0)
S
7

Transactions need to be ACID, but the "Durability" needs to be enforced only at the transaction's end. So physical IO to the disk may be postponed after the apparent SQL statement execution and actually done in the background, while your transaction is processing other statements.

As a consequence, issuing SQL statements serially may not be much slower than issuing them concurrently. Consider this scenario:

  • Execute the SQL statement [A] that writes data. The disk is not actually touched, writes are simply queued for later, so the execution flow returns very quickly to the client (i.e. [A] does not block for long).
  • Execute the SQL statement [B] that writes data. Writes are queued and [B] does not block for long, just as before. The physical I/O of [A] may already be happening in the background at this point.
  • Other processing takes place in the transaction, while DBMS performs the physical I/O to the disk in the background.
  • The transaction is committed.
    • If queued writes are finished, there is no need to wait.
    • If queued writes are not finished by now, wait until they are. BTW, some databases can relax the "Durability" requirements to avoid this wait, but not MS SQL Server (AFAIK).

Of course there are scenarios where this "auto-parallelism" of DBMS would not work well, for example when there is a WHERE clause that for different statements touches different partitions on different disks - DBMS would love to parallelize these clauses but can't if they are fed to it one-by-one.

In any case, don't guess where your performance bottleneck is. Measure it instead!


BTW, MARS will not help you in parallelizing your statements - according to MSDN: "Note, however, that MARS is defined in terms of interleaving, not in terms of parallel execution."

Scarper answered 25/10, 2011 at 12:25 Comment(2)
Lots of great info in here. To your point of measurement, what I end up with is, I fear, apples to oranges: I can compare 1 connection & 1 transaction executing serially to 80,000 connections & transactions in parallel (w/ pooling). Between those, there is a 15 second difference (about a minute and a half altogether) Depending on how efficiently pooling works, I was hope for substantial savings if I could maintain the same connection.Whimsey
Addendum: The scenario above with 1 connection + blocking comes out to about 10 seconds faster than serial, and about 5 seconds slower than 'full parallel'Whimsey
N
1

Discovered that OleDBConnection doesn't seem to be ThreadSafe.

Yes, that's in accordance with the documentation:

Any public static (Shared in Visual Basic) members of this type are thread safe. Any instance members are not guaranteed to be thread safe.

So simply create the connection inside the thread and leave the underlying OLE DB provider handle the connection pooling. Also if you have the possibility, definitely get rid of OleDbConnection and use the corresponding ADO.NET driver for your database and unless you are running some very exotic database, there should be an ADO.NET driver.

Nae answered 9/10, 2011 at 18:44 Comment(1)
I need to share the transaction -- business rules dictate that everything rolls back on failure.Whimsey
U
1

Since it's not threadsafe, change the Parallel.ForEach to a normal foreach and do them serially. It's better for it to work slower than not at all.

Ulotrichous answered 24/10, 2011 at 22:30 Comment(0)
T
0

To get the most performance gain, open a new connection inside your Parallel.ForEach. That way you will have true parallel connections to database.

Make sure you have connection pooling enabled and min and max connection property appropriately set.

Try this approach and use Stopwatch class to time the performance between different approach and pick the one that works best in your case. It depends on the kind of queries you will be executing against the database and the schema.

Tokay answered 27/10, 2011 at 7:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.