SqlConnections, Parallel.For, an old C# application and random hangs on initializing SqlConnections
Asked Answered
H

1

7

my problem: Earlier this week, I got the task to speed up a task in our program. I looked at it and immediately got the idea of using a parallel foreach loop for a function in that task.

I implemented it, went through the function (including all sub-functions) and changed the SqlConnections (and other stuff) so it'd be able to run in parallel. I started the whole thing and all went well and fast (alone that reduced the time for that task by ~45%)

Now, yesterday we wanted to try the same thing with some more data and ...I got some weird problem: Whenever the parallel function got called, it did it's work...but sometimes one of the threads would hang for at least 4 minutes (timeouts are set to one minute, for connection AND command).

If I pause the program during that, I see that only one thread is still active from that loop and it hangs on

connection.Open()

After ~4 minutes the program simply proceeds without throwing an error (aside from a message in the Output box, saying that an exception somewhere occured, but it wasn't catched by my application but somewhere in the SqlConnection/SqlCommand object).

I can kill all connections on the MSSQLServer without anything happens, also the MSSQLServer does nothing during those 4 minutes, all connections are idle.

This is the procedure that is used for sending Update/Insert/Delete statements to the database:

int i = 80;
bool itDidntWork = true;
Random random = new Random();
while (itDidntWork && i > 0)
{
    try
    {
        using (SqlConnection connection = new SqlConnection(sqlConnectionString))
        {
            connection.Open();
            lock (connection)
            {
                command.Connection = connection;
                command.ExecuteNonQuery();
            }

            itDidntWork = false;
        }
    }
    catch (Exception ex)
    {
        if (ex is SqlException && ((SqlException)ex).ErrorCode == -2146232060)
        {
            Thread.Sleep(random.Next(500, 5000));
        }
        else
        {
            SqlConnection.ClearAllPools();
        }

        Thread.Sleep(random.Next(50, 110));
        i--;
        if (i == 0)
        {
            writeError(ex);
        }
    }
}

just in case: on smaller databases there can occur deadlocks (err number 2146232060), so if one occurs, I've to make the colliding statements occur in different time. Works great even on small databases/small servers. If the error wasn't caused by a deadlock, chances are that the connection was faulty, so I'm cleaning all broken connections.

Similiar functions exist for executing scalars, filling datatables/datasets (yes, the application is that old) and executing storedprocedures.

And yes all of those are used in the parallel loop.

Has someone any idea what could be going on there? Or an idea on how I can find out what is going on there?

*edit about the command object:

it is given to the function, the command object is always a new object when it is given into the function.

about the lock: If I put the lock away, I get dozens and hundreds of 'connection is closed' or 'connection is already open' errors, because the Open() function just get's a connection from .NET's connection pool. The lock does work as intended.

Example code:

using(SqlCommand deleteCommand = new SqlCommand(sqlStatement))
{
    ExecuteNonQuerySafely(deleteCommand); // that's the function that contains the body I posted above
}

*edit 2

I've to make a correction: It hangs on this

command.Connection = connection;

at least I guess it does, because when I pause the application, the 'step' mark thingi is green and on

command.ExecuteNonQuery();

saying that that is the statement that'll be executed next.

*edit 3 just to be sure I just started another test without any locks around the connection object...will take some minutes to get the results.

*edit 4 well, I was wrong. I removed the lock statements and...it still worked. Maybe the first time I tried it there was a reused connection or something. Thank's for pointing it out.

*edit 5 I'm getting the feeling that this occurs only with one specific call to a specific database procedure. I don't know why. C# wise there is no difference between that call and other calls see edit 6. And since it didn't execute the statement at that point (I guess. Maybe someone can correct me on that. If, in debug mode, a line is green marked (instead of yellow) it didn't execute that statement yet but waits for the statement before that line to finish, is that correct?) it's strange.

*edit 6 There were 3 command objects that were reused the whole time. They were defined above the parallel function. I don't know how bad that is/was. They were only used to call one stored procedure (each of them called a different procedure), of course with different parameters and a new connection (through the above mentioned method).

*edit 7 ok, it's really only when one specific stored procedure is called. Except that it's on the assignment of the connection object that it hangs (next line is marked green). Trying to figure out what the cause for that is atm.

*edit 8 yay, it just happened at another command. So that was that.

*edit 9 ok. Problem solved. The 'hangs' were actually CommandTimeouts that were set to 10 minutes(!). They were only set for two commands (the one I mentioned in edit 7 and the one that I mentioned in edit 8). Since I found both of them while I was restructuring my commands to make them like devundef suggested, I marked his answer as the one that solved my problem. Also his suggestion of limiting the amounts of threads my for-loop was using sped up the process even more.

Special thank's to Marc Gravell for explaining stuff and hanging in here with me on a saturday ;)

Hugely answered 18/8, 2012 at 17:36 Comment(16)
I think this question requires a bit more context. I see that there's a reused SqlCommand object (which is a bad practice to begin with). Could you show how this code is actually used in the context of parallelizing your database access?Exoteric
Locking on a local connection (so that this never really locks anything) and reusing the shared (?) SqlCommand across multiple threads (?) smells fishy.Best
Indeed; what is command, and why does it appear to be shared?Agminate
Also: most deadlocks can be avoided by structuring correctly. For example, taking UPDLOCKAgminate
lock (connection) does nothing, so take it out for the sake of simplicity. I don't see where command comes from, it's not shared, is it? Have you tried monitoring the number of held connection objects (up a shared count on open and decrement on close) to see how high it gets compared to the pool maximum?Oneiric
at all updated the question @Marc Gravell for one, that'd mean touching the business logic and I don't want to do that (let's just say I've nearly no clue what happens there), for another I'm not really sure how those deadlocks are there anyway. There are update/insert statements on totally different records (same table) and they're creating a deadlock. Does MSSQL lock the entire table for adding/updating records? (I hope not...)Hugely
Re the edit: no, the lock doesn't "fix" that. If it does, then your code is massively broken. There's a difference between the underlying connection (from the pool) and the SqlConnection that wraps it; that SqlConnection is local and unshared - locking it does nothing useful in the code you are showing. Is this directly similar to your real code?Agminate
uhm. The moment I do a open(), the SqlConnection object is drawn from the pool, since I do the lock after the open it locks the object from the pool. If not, I'd really like to see a source of that claim. And yes the body of the function is exactly what I've in the application.Hugely
@SteffenWinkler that depends on the isolation level; my bet would be that they are key-range locks, so yes: they can impact other rows. But it is allowed to use wider locks, of course, but it usually doesn't unless you either do a massive update, or explicitly ask it to,Agminate
@Steffen no; when you call Open(), it gets an underlying connection from the pool. But you aren't locking the underlying connection - you are locking the SqlConnection: totally different. The SqlConnection is not from the pool; it is "new"Agminate
okay...in that case I've no clue why it is working. I swear the moment I don't lock, I get connections that are currently used from the pool. I get closed connections a line after I opened it, I get executing connections but I didn't do anything with the connection at that point. Stuff like that. I'd guess that it does lock the underlying connection aswell since it is part of the SqlConnection object. *edit is it possible to lock the underlying connection?Hugely
@Steffen are you sure you aren't somehow leaking a SqlConnection between threads? Or the command object? No, you can't lock an underlying connection, but you should never need to: one should never be shared between 2 SqlConnection objects.Agminate
@Marc about the connection object: yes I'm really sure, all functions that deal with connections to the database are like the one I posted in the question. About the command object: There could be a case where one of them is reused, but with another connection. Is that bad? And yes, how bad? (I always thought it's only bad to reuse a connection object)Hugely
@Steffen as long as it isn't trying to be used concurrently by two threads who think they are involved with he command, it should be fine. The problem here is: what you describe is very atypical and not obvious from the code posted. It sounds subtle. Unfortunately, I suspect someone would need to be able to actually reproduce this to debug it, and I can't for the life of me think how I could get this to repro.Agminate
@MarcGravell well, that could've happened, since one of those commands was used in a for loop that is inside the parallel loop. Totally possible that it was used by two threads at the same time.Hugely
@Steffen yeah... Don't do thatAgminate
D
4

I think the problem can be found in your edit 6: edit 6: ...3 command objects were reused the whole time.

Any data that's used inside a parallel loop must be created inside the loop or it must have the proper synchronization code in place to ensure only 1 thread at a time has access to that particular object. I don't see such code inside the ExecuteNonQuerySafely. a) Locking the connection has no effect there because the connection object is created inside the method. b) Locking the command will not guarantee thread safety - probably you're setting the command parameters before locking it inside the method. A lock(command) will work if you lock the command before call the ExecuteNonQuerySafely, however locks inside a parallel loop is not a good thing to do - it's the definition of anti-parallel afterall, better avoid this altogether and create a new command for each iteration. Better yet would be to do a little refactoring on ExecuteNonQuerySafely, it could accept a callback action instead of an SqlCommand. Example:

public void ExecuteCommandSafely(Action<SqlCommand> callback) {
   ... do init stuff ...
   using (var connection = new SqlConnection(...)) {
      using (var command = new SqlCommand() {
         command.Connection = connection;
         try{
            callback(command);
         }
         ... error handling stuff ...
      }          

   }

}

And use :

ExecuteCommandSafely((command) => {
   command.CommandText = "...";
   ... set parameters ..
   command.ExecuteNonQuery();
});

Last, the fact that you're getting errors executing the commands in parallel is a sign that maybe parallel execution is not a good thing to do in this case. You're wasting servers resources to get errors. Connections are expensive, try to use the MaxDegreeOfParalellism option to tune the workload for this particular loop (remembering that the optimal value will change according to the hardware/server/network/etc..). The Parallel.ForEach method has an overload that accepts a ParallelOptions parameters where you can set how many threads you want to execute in parallel for that par (http://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism.aspx).

Dunleavy answered 18/8, 2012 at 20:8 Comment(10)
Hi, well I redesigned the stuff with the command objects, so that's no longer a problem (all command objects are now really only used once). Thank's for the code snippets, guess I'll use them ;)\\ And many, many thank's for the ParallelOptions thing! Didn't knew about that. The MaxDegreeOFParalellism must be set for the whole server or can I just add it to the connection string? Also, as far as I understood it, it 'just' limits on how many threads the server can use for one query, right or wrong?Hugely
You're right, the parallel options is per context, you set the options for each ForEach/For etc independently, it's not a global setting. When you call Parallel.ForEach you can pass a ParallelOptions to it specifying how many threads do you want to run in parallel.Dunleavy
@Steffen frankly, there's n benefit in reusing command objects at all. I'd just keep them local. Problem solved :)Agminate
@devundef in your code snippet...where should I open the connection? Before or after I assigned it to the command? (I guess after...) *edit also shouldn't the try/catch block be around the connection/command object? (reuse)Hugely
@SteffenWinkler. You can open before or after create the command, I think this not make difference. The try catch a think not make a difference too because the using statement is translated by the compiler to a try..finally block, but you can if you want. I think for reading is better to put the usings inside the try catch...Dunleavy
@devundef about the MaxDegreeOfParalellism in case you wondered wtf I asked there: I thought you meant the MSSQL Server option, didn't knew that the name was the same for the Parallel.For/ForEach option thing. about the try/catch: well, since I repeat the statement when it fails (for whatever reason) for up to 80 times (deadlocks, timeouts, whatever) it does make a difference where I put the try/catch.Hugely
well, I indeed managed to get a better performance by limiting the amount of threads. Also, and that is why I'll mark your answer as 'the answer': By restructuring the command calls the way you showed, I stumbled upon 2 commands (that were the ones that caused the problems) that had a CommandTimeout set to 600 (which would be 10 minutes). argh. After setting it to 60 seconds, I got timeouts instead of 'hangs'.Hugely
Haha, parallel sql would be awesome :), 10 minutes is a lot of time, i would give one minute at most.Dunleavy
msdn.microsoft.com/de-de/library/… "specifies maximum number of processors, up to 32, to use in parallel plan execution." that's what I meant. Both options have the same name, hence my confusion.Hugely
That explains a lot, I did know about that property in SQL Server.Dunleavy

© 2022 - 2024 — McMap. All rights reserved.