Thread abort leaves zombie transactions and broken SqlConnection
Asked Answered
S

2

16

I feel like this behavior should not be happening. Here's the scenario:

  1. Start a long-running sql transaction.

  2. The thread that ran the sql command gets aborted (not by our code!)

  3. When the thread returns to managed code, the SqlConnection's state is "Closed" - but the transaction is still open on the sql server.

  4. The SQLConnection can be re-opened, and you can try to call rollback on the transaction, but it has no effect (not that I would expect this behavior. The point is there is no way to access the transaction on the db and roll it back.)

The issue is simply that the transaction is not cleaned up properly when the thread aborts. This was a problem with .Net 1.1, 2.0 and 2.0 SP1. We are running .Net 3.5 SP1.

Here is a sample program that illustrates the issue.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using System.Data.SqlClient;
using System.Threading;

namespace ConsoleApplication1
{
    class Run
    {
        static Thread transactionThread;

        public class ConnectionHolder : IDisposable
        {
            public void Dispose()
            {
            }

            public void executeLongTransaction()
            {
                Console.WriteLine("Starting a long running transaction.");
                using (SqlConnection _con = new SqlConnection("Data Source=<YourServer>;Initial Catalog=<YourDB>;Integrated Security=True;Persist Security Info=False;Max Pool Size=200;MultipleActiveResultSets=True;Connect Timeout=30;Application Name=ConsoleApplication1.vshost"))
                {
                    try
                    {
                        SqlTransaction trans = null;
                        trans = _con.BeginTransaction();

                        SqlCommand cmd = new SqlCommand("update <YourTable> set Name = 'XXX' where ID = @0; waitfor delay '00:00:05'", _con, trans);
                        cmd.Parameters.Add(new SqlParameter("0", 340));
                        cmd.ExecuteNonQuery();

                        cmd.Transaction.Commit();

                        Console.WriteLine("Finished the long running transaction.");
                    }
                    catch (ThreadAbortException tae)
                    {
                        Console.WriteLine("Thread - caught ThreadAbortException in executeLongTransaction - resetting.");
                        Console.WriteLine("Exception message: {0}", tae.Message);
                    }
                }
            }
        }

        static void killTransactionThread()
        {
            Thread.Sleep(2 * 1000);

            // We're not doing this anywhere in our real code.  This is for simulation
            // purposes only!
            transactionThread.Abort();

            Console.WriteLine("Killing the transaction thread...");
        }

        /// <summary>
        /// The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main(string[] args)
        {
            using (var connectionHolder = new ConnectionHolder())
            {
                transactionThread = new Thread(connectionHolder.executeLongTransaction);
                transactionThread.Start();

                new Thread(killTransactionThread).Start();

                transactionThread.Join();

                Console.WriteLine("The transaction thread has died.  Please run 'select * from sysprocesses where open_tran > 0' now while this window remains open. \n\n");

                Console.Read();
            }
        }
    }
}

There is a Microsoft Hotfix targeted at .Net2.0 SP1 that was supposed to address this, but we obviously have newer DLL's (.Net 3.5 SP1) that don't match the version numbers listed in this hotfix.

Can anyone explain this behavior, and why the ThreadAbort is still not cleaning up the sql transaction properly? Does .Net 3.5 SP1 not include this hotfix, or is this behavior that is technically correct?

Spicy answered 2/6, 2011 at 19:20 Comment(17)
Please no comments about not using Thread.Abort - we aren't using it anywhere. It's just sometimes a fact of life that IIS throws causes them if you accidentally get an appdomain recycle, or whatever. We're not using Thread.Abort anywhere in our code :) We've just noticed this behavior and traced it back to this scenario - the sample program is obviously contrived.Spicy
I see some problems with your code, although I don't think it'll impact much, but you should not be opening the connection outside the try/catch block. If you somehow manage to abort the thread before it has reached the try/catch block, the connection will be left open. Also, why are you using thread-abort to cancel a long-running query? Why are you not using SqlCommand.Cancel? - msdn.microsoft.com/en-us/library/…Sontich
If you're not using Thread.Abort anywhere in your code, you might want to put that comment into that code, since Thread.Abort is very prominently placed right smack in the middle of the code you posted here. I understand that it is example code and all that, but you should place the comment there, and not in a comment. Otherwise you will get those comments.Sontich
Also, why are you doing long-running transactions from a web application?Sontich
Hahah... I was too slow to pre-empt it. The only reason the SqlConnection is outside the try/catch is so that I can attempt to reopen it when I catch the ThreadAbort. This example is totally contrived though - it doesn't represent our real code. Our transactions aren't exactly long-running. The query in question was brought up to about 5 seconds execution time under extremely heavy load, which is when we started noticing the problem. Again - contrived example. Can we please focus on the actual behavior I am asking about?Spicy
@Lasse V. Karlsen I'd guess from class Program1 this is a minimal test-case and not the real code :-)Kareenkarel
But then you've fallen prey to another big no-no, don't post questions with code that has different problems than your production code. People will get hung up on the code you post, regardless of how contrived it is. They will assume you have narrowed the problem down to this type of code, and is asking for help on fixing the code as posted.Sontich
@Lasse V. Karlsen It's not a different problem. It's simulating the problem as described (presumably so others can test it or it can be verified in a unit test). Note the included waitfor in the TSQL.Kareenkarel
Have you checked if this might be a bug in SQL Server? You've applied all servicepacks and relevant hotfixes I assume? And are you running SQL Server 2008 R2? Sometimes the fix-lists aren't altogether complete.Sontich
I believe all our production machines are on R2, but I will double-check.Spicy
Looks like we're still on Sql Server 2008 SP2 (not R2), and IT has assured me it's up to date.Spicy
would it be possible to use a transactionscope?Scriptural
@Scriptural - I don't think so. The behavior is that when the threadabort happens, the SQL command finishes executing (it's in unmanged code)... and when the call stack returns to managed code, the connection is already in a closed state.Spicy
I concur with womp: Sometimes there is thin line between minimal working demonstration and contrived example with no external relevance. I'm not about to cast my vote on this question yet, but I do feel that @Lasse's 'conviction' (another big no-no) is a bit harsh or earlyAmaro
1. Is it possible to use a stored procedure? 2. Is your connection and transaction in managed code or both in unmanaged code?Scriptural
@Scriptural - The code is right there to demonstrate the problem. The connection and transaction are both being started through a SqlCommand in managed code.Spicy
Would it be possible to use a stored procedure and do the transaction work there?Scriptural
T
4

This is a bug in Microsoft's MARS implementation. Disabling MARS in your connection string will make the problem go away.

If you require MARS, and are comfortable making your application dependent on another company's internal implementation, familiarize yourself with http://dotnet.sys-con.com/node/39040, break out .NET Reflector, and look at the connection and pool classes. You have to store a copy of the DbConnectionInternal property before the failure occurs. Later, use reflection to pass the reference to a deallocation method in the internal pooling class. This will stop your connection from lingering for 4:00 - 7:40 minutes.

There are surely other ways to force the connection out of the pool and to be disposed. Short of a hotfix from Microsoft, though, reflection seems to be necessary. The public methods in the ADO.NET API don't seem to help.

Tilbury answered 8/11, 2011 at 19:1 Comment(1)
Isn't MARS disabled by default?Fewell
A
8

Since you're using SqlConnection with pooling, your code is never in control of closing the connections. The pool is. On the server side, a pending transaction will be rolled back when the connection is truly closed (socket closed), but with pooling the server side never sees a connection close. W/o the connection closing (either by physical disconnect at the socket/pipe/LPC layer or by sp_reset_connection call), the server cannot abort the pending transaction. So it really boils down to the fact that the connection does not get properly release/reset. I don't understand why you're trying to complicate the code with explicit thread abort dismissal and attempt to reopen a closed transaction (that will never work). You should simply wrap the SqlConnection in an using(...) block, the implied finally and connection Dispose will be run even on thread abort.

My recommendation would be to keep things simple, ditch the fancy thread abort handling and replace it with a plain 'using' block (using(connection) {using(transaction) {code; commit () }}.

Of course I assume you do not propagate the transaction context into a different scope in the server (you do not use sp_getbindtoken and friends, and you do not enroll in distributed transactions).

This little program shows that the Thread.Abort properly closes a connection and the transaction is rolled back:

using System;
using System.Data.SqlClient;
using testThreadAbort.Properties;
using System.Threading;
using System.Diagnostics;

namespace testThreadAbort
{
    class Program
    {
        static AutoResetEvent evReady = new AutoResetEvent(false);
        static long xactId = 0;

        static void ThreadFunc()
        {
            using (SqlConnection conn = new SqlConnection(Settings.Default.conn))
            {
                conn.Open();
                using (SqlTransaction trn = conn.BeginTransaction())
                {
                    // Retrieve our XACTID
                    //
                    SqlCommand cmd = new SqlCommand("select transaction_id from sys.dm_tran_current_transaction", conn, trn);
                    xactId = (long) cmd.ExecuteScalar();
                    Console.Out.WriteLine("XactID: {0}", xactId);

                    cmd = new SqlCommand(@"
insert into test (a) values (1); 
waitfor delay '00:01:00'", conn, trn);

                    // Signal readyness and wait...
                    //
                    evReady.Set();
                    cmd.ExecuteNonQuery();

                    trn.Commit();
                }
            }

        }

        static void Main(string[] args)
        {
            try
            {
                using (SqlConnection conn = new SqlConnection(Settings.Default.conn))
                {
                    conn.Open();
                    SqlCommand cmd = new SqlCommand(@"
if  object_id('test') is not null
begin
    drop table test;
end
create table test (a int);", conn);
                    cmd.ExecuteNonQuery();
                }


                Thread thread = new Thread(new ThreadStart(ThreadFunc));
                thread.Start();
                evReady.WaitOne();
                Thread.Sleep(TimeSpan.FromSeconds(5));
                Console.Out.WriteLine("Aborting...");
                thread.Abort();
                thread.Join();
                Console.Out.WriteLine("Aborted");

                Debug.Assert(0 != xactId);

                using (SqlConnection conn = new SqlConnection(Settings.Default.conn))
                {
                    conn.Open();

                    // checked if xactId is still active
                    //
                    SqlCommand cmd = new SqlCommand("select count(*) from  sys.dm_tran_active_transactions where transaction_id = @xactId", conn);
                    cmd.Parameters.AddWithValue("@xactId", xactId);

                    object count = cmd.ExecuteScalar();
                    Console.WriteLine("Active transactions with xactId {0}: {1}", xactId, count);

                    // Check count of rows in test (would block on row lock)
                    //
                    cmd = new SqlCommand("select count(*) from  test", conn);
                    count = cmd.ExecuteScalar();
                    Console.WriteLine("Count of rows in text: {0}", count);
                }
            }
            catch (Exception e)
            {
                Console.Error.Write(e);
            }

        }
    }
}
Aleron answered 2/6, 2011 at 21:33 Comment(10)
If I ditch the thread abort, then it's not simulating the problem. Even inside a using block, the transaction is still open on the server. Run the program and see.... alter the connection to use a using block... same problem... The issue is that the connection is returned to the pool without cleaning up the transaction on the database.Spicy
And we aren't doing anything fancy in the queries... it's pretty basic update statements, no distributed transactions, etc. - the sample program just has a tiny update statement and a "waitfor" and it shows the same issue.Spicy
I simplified the program a bit to take into account some of your suggestions.Spicy
See my attached program, it shows that basic using blocks are properly closing the connection and rolling back the transaction on thread abort.Aleron
Remus - I love that you are taking the time to look at this - thank you. I just ran your program and the behavior is still happening! img17.imageshack.us/img17/978/stillhappening.png Are you using .Net 4?Spicy
I tested both .Net 4.0 and .Net 3.5. on SQL 2008 R2 and the transaction always rolled back.Aleron
Okay... I installed R2 trial edition on a fresh machine and ran both your test program and my test program against it, and I'm still seeing the problem.Spicy
I tested this repeatedly and indeed after several iterations I can hit the issue. The connection is left open by ADO.Net and therefore the transaction is not rolled back on the server. The inserted row is still locked. .Net 3.5 vs. R2Aleron
We've got MS on the phone about this... we may end up implementing a reflection workaround. Check out this article for some interesting insight into connection pool internals: dotnet.sys-con.com/node/39040Spicy
@Spicy Did you or MS actually resolve this problem?. I'm afraid that I run into the same situation here.Glassblowing
T
4

This is a bug in Microsoft's MARS implementation. Disabling MARS in your connection string will make the problem go away.

If you require MARS, and are comfortable making your application dependent on another company's internal implementation, familiarize yourself with http://dotnet.sys-con.com/node/39040, break out .NET Reflector, and look at the connection and pool classes. You have to store a copy of the DbConnectionInternal property before the failure occurs. Later, use reflection to pass the reference to a deallocation method in the internal pooling class. This will stop your connection from lingering for 4:00 - 7:40 minutes.

There are surely other ways to force the connection out of the pool and to be disposed. Short of a hotfix from Microsoft, though, reflection seems to be necessary. The public methods in the ADO.NET API don't seem to help.

Tilbury answered 8/11, 2011 at 19:1 Comment(1)
Isn't MARS disabled by default?Fewell

© 2022 - 2024 — McMap. All rights reserved.