Multiple Simultaneous SQL Connection Timeouts In Multithreaded Windows Service
Asked Answered
S

5

25

I have a multithreaded Windows Service I've developed with VS 2010 (.NET 4.0) which can have anywhere from a few to a few dozen threads, each retrieving data from a slow server over the Internet and then using a local database to record this data (so the process is Internet-bound, not LAN or CPU bound).

With some regularity, I am getting a flood/flurry/burst of the following error from several threads simultaneously:

System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

The call stack for this error is typically:

at System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject)

at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)

at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)

at System.Data.SqlClient.SqlConnection.Open()

I'm not specifying a Connection Timeout in the connection string, and there are other applications and processes working in this database. Has anyone come across this kind of behavior and if so what was done to prevent it?

The most commonly-called method in my data access layer looks like this, and all my other DAL methods follow the same approach:

using (SqlConnection con = new SqlConnection(GetConnectionString()))
using (SqlCommand cmd = new SqlCommand("AddGdsMonitorLogEntry", con))
{
    cmd.CommandType = CommandType.StoredProcedure;

    /* setting cmd.Parameters [snipped] */

    // We have been getting some timeouts writing to the log; wait a little longer than the default.
    cmd.CommandTimeout *= 4;

    con.Open();

    cmd.ExecuteNonQuery();
}

Thanks very much!

EDIT

Given comments about this occurring in mirrored environments, I should indeed mention that the database in question is mirrored. It's marked in SSMS as "Principal, Synchronized", in "High safety without automatic failover (synchronous)" mode.

EDIT 5/26/11

I am seeing nothing in the SQL Server logs to indicate any problems. (I don't have access to the Windows Event Viewer on that server, but I've asked for someone to look for me.)

Speedy answered 4/5, 2011 at 14:20 Comment(5)
I am also seeing the exact same problem, with the same stacktrace. The database it's connecting to is mirrored and the connection string specifies a failover partner. I have been unable to reproduce the same problem from my local desktop, opening a bunch of connections and never closing them results in a different exception message.Antemundane
These links report a similar problem, but none offer a solution: 1 2 3 4 5Antemundane
I think the problem is not in connection, client or database. But in queries executed. Verify them, e.g. gather statistics what SPs/queries raises an exception more oftenPipsissewa
As an update to the problem I have seen, adding a Connection Timeout=500; to the connection string seems to prevent the problem. So it looks like it might really be a problem with the database taking too long to respond, B=but I really don't want to leave the timeout that high.Antemundane
@AleXintlsos - so what was your final resolution to this problem? Increase the timeout (to what?) and increase connection pool?Davide
S
15

According to the MSDN Blog post just created today (hooray for Google!):

Microsoft has confirmed that this is a problem in the current release of ADO.NET. This issue will be fixed in ADO.NET version, ships with Visual Studio 2011.

In the meantime, we request to use the following workarounds:

  1. Increase the connection string timeout to 150 sec. This will give the first attempt enough time to connect( 150* .08=12 sec)

  2. Add MinPool Size=20 in the connection string. This will always maintain a minimum of 20 connections in the pool and there will be less chances of creating new connection, thus reducing the chance of this error.

  3. Improve the network performance. Update your NIC drivers to the latest firmware version. We have seen network latency when your NIC card is not compatible with certain Scalable Networking Pack settings. If you are on Windows Vista SP1 or above you may also consider disabling Receive Window Auto-Tuning. If you have NIC teaming enabled, disabling it would be a good option.

The post itself is an interesting read, talking about a TCP/IP connection retry algorithm. And kudos to all the folks who said "hey this looks like it's related to mirroring..."! And note the comment about this being "because of slow response from SQL Server or due to network delays".

UGH!!!

Thanks to everyone who posted. Now we must all ask for a patch to the .NET Framework (or some other ADO.NET patching mechanism), so we don't have to wait for (and buy) Visual Studio 11...

Speedy answered 26/5, 2011 at 21:24 Comment(0)
C
7

Connection timeout is a different thing than command timeout. Command timeout applies to situation when you have connection established, but due to some internal reasons server cannot return any results within required time. Default command timeout is 30 seconds. http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommand.commandtimeout.aspx

Try to specify connection timeout in the connection string. Default value is 15 seconds what may be the reason of the issue you see. You can also specify connection timeout in code: http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.connectiontimeout.aspx

Crissy answered 15/5, 2011 at 14:2 Comment(3)
I am thinking about increasing the connection timeout, but this error is only occurring on connections, not on stored procedure execution. Command timeout won't have anything to do with this. But my concern is more about why this is only happening intermittently, and why it affects so many attempted connections at the same time. I've got a DBA/developer person here who says it sounds like there may be something going on with the server side connection pool (and I have to admit I didn't know there was such a thing as a server side pool). But that's not been verified.Speedy
If there is something wrong with the server connection pool, there would be some indications of it in the SQL Server error log. Can you verify this? I would think that if the network is slower at times, connections from all threads would suffer because of this. After all every connection goes through the same wires. We had similar issue recently - intermittent timeouts experienced by our application server. It turned out that the routing between servers was not correct and the network performance fluctuated a lot. Then our application would throw timeout exceptions from all threads.Crissy
I'm afraid there's nothing in the SQL Server logs. Any recommendations on how to detect network performance fluctuations?Speedy
S
1

I get this every once in a while on this old database server that we have (coming up on 10 years old now). When it does happen though it's because something is hammering that thing with connections/queries constantly. My guess is that you'll find that when it happens the database server is under load (or a high number of connections or something along those lines) Anyway, in my experience if you can optimize the code, optimize the database, getting a beefier database server, etc. all helps. Another thing you can do, which Piotr suggests, is simply up the timeout for the connection. I'd still go through and optimize some stuff though (should help in the long run).

Stonebroke answered 20/5, 2011 at 14:9 Comment(0)
A
0

I have been able to somewhat reliably reproduce this problem. I have a service that when a processing job is requested it kicks off processing in a new appdomain / thread. This thread will execute 10 to 16 database queries simultaneously. When I run 30 of these jobs one after another then a random one or two of the jobs will crash with the timeout error.

I changed the connection string to turn off Connection Pooling with Pooling=false and then the error changed to the following. This gets thrown 3 or 4 times inside an aggregate exception, since the connections are happening inside a Parallel.For

System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning()
   at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
   at System.Data.SqlClient.TdsParserStateObject.ReadSni(DbAsyncResult asyncResult, TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket()
   at System.Data.SqlClient.TdsParser.ConsumePreLoginHandshake(Boolean encrypt, Boolean trustServerCert, Boolean& marsCapable)
   at System.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, Boolean ignoreSniOpenTimeout, Int64 timerExpire, Boolean encrypt, Boolean trustServerCert, Boolean integratedSecurity)
   at System.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, Boolean ignoreSniOpenTimeout, TimeoutTimer timeout, SqlConnection owningObject)
   at System.Data.SqlClient.SqlInternalConnectionTds.LoginWithFailover(Boolean useFailoverHost, ServerInfo primaryServerInfo, String failoverHost, String newPassword, Boolean redirectedUserInstance, SqlConnection owningObject, SqlConnectionString connectionOptions, TimeoutTimer timeout)
   at System.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(SqlConnection owningObject, TimeoutTimer timeout, SqlConnectionString connectionOptions, String newPassword, Boolean redirectedUserInstance)
   at System.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, Object providerInfo, String newPassword, SqlConnection owningObject, Boolean redirectedUserInstance)
   at System.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection)
   at System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup)
   at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
   at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
   at System.Data.SqlClient.SqlConnection.Open()
   at Tps.PowerTools.CoreEngine.V5.DataAccess.DataContext.ExecuteQuery(PtQuery query, ValueStore`1 store, String readerDescription) in C:\SourceCode\Tps.PowerToolsV1\Trunk\Libraries\CoreEngine\CoreEngine.V5\DataAccess\DataContext.cs:line 326
   at Tps.PowerTools.CoreEngine.V5.DataAccess.DataContext.<StockHistoricalData>b__15(PtQuery query) in C:\SourceCode\Tps.PowerToolsV1\Trunk\Libraries\CoreEngine\CoreEngine.V5\DataAccess\DataContext.cs:line 302
   at System.Threading.Tasks.Parallel.<>c__DisplayClass32`2.<PartitionerForEachWorker>b__30()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass7.<ExecuteSelfReplicating>b__6(Object )
Antemundane answered 15/5, 2011 at 21:38 Comment(0)
V
0

Optimizing the queries you are executing on the remote server will always help. Time each query and look for long running ones. If you are just doing reads then use the (NOLOCK) hint on the SELECT statements. This was a life saver for me. Just read up on it to make sure it is appropriate in your application. If you have access to the remote database make sure the indexes are not to fragmented. This will cause a major slow down in query execution. Make sure indexes are rebuilt/reorganized as part of the SQL maintenance plan. Add new indexes where appropriate.

Extending the timeout may make matters worse. If you let queries run longer then, potentially, more queries will time out. The timeout is there to protect the server and other clients accessing it. Bumping it up a little is not a huge deal but you don't want queries running for a long time killing the server.

Verbenia answered 20/5, 2011 at 15:30 Comment(3)
I think this answer would be pertinent if I were experiencing the problem during command execution; however, it's occurring on the opening of a connection, so queries are not even yet begun.Speedy
"If you are just doing reads, then use the (NOLOCK) hint" - that is not good general advice to follow.Frohne
@Frohne that is why there is the the caveat sentence "Just read up on it to make sure it is appropriate in your application." in my response.Verbenia

© 2022 - 2024 — McMap. All rights reserved.