How do I address sudden loss of connection to SQL Azure in my Azure role?
Asked Answered
G

2

14

My Azure role grabs stuff to process from a database - it holds an instance of System.Data.SqlClient.SqlConnection and periodically creates an SqlCommand instance and executes an SQL query.

Now once in a while (usually once in several days) running a query will trigger an SqlException exception

The service has encountered an error processing your request. Please try again. Error code 40143. A severe error occurred on the current command. The results, if any, should be discarded.

Which I've already seen many times and now my code catches it, calls Dispose() on the SqlConnection instance and then reopens the connection and retries the query. The latter typically results in another SqlException exception

Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

Which looks pretty much like SQL Azure server not responding or being unavailable for whatever reason.

Currently my code doesn't catch the latter exception, it is propagated outside RoleEntryPoint.Run() and the role is restarted. Restart typically takes about ten minutes and once it completes the problem is gone for a day or so.

I don't like my role restarting - it's takes a while and my service functionality is hindered. I'd like to do something smarter.

What would be a strategy to address this problem? Should I retry the query several times and how many times and with what interval? Should I do something else? When do I give up and let the role just restart?

Gatling answered 10/8, 2011 at 6:3 Comment(0)
N
14

I would strongly recommend you have a look at the Transient Fault Handling Framework for SQL Azure

This will help you handle retry logic for both connection and query attempts, I am using this in production and it works great. There is also a nice article on technet that might be of some use.

[EDIT: 17 Oct 2013]

It looks like this has been picked up by the patterns and practices team at The Transient Fault Handling Application Block

Nilsson answered 10/8, 2011 at 7:14 Comment(4)
It's available via nuget as well.Colligate
Thanks dunnry I hadn't spotted that. I blame Wade for not keeping us properly up to date ;)Nilsson
That Transient Fault Handling site has an intransigent fault: is down.Gunflint
And now it's obsolete, and there is no supported solution.Dietz
H
2

We use TransientFaultHandling and it doesn't handle all of the strange exceptions.

For example, this one popped up yesterday:

The service has encountered an error processing your request. Please try again. Error code 40143. A severe error occurred on the current command. The results, if any, should be discarded. , stacktrace at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning() at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, . . .

The reasonable approach that will work even with this:

  1. Identify a coarse-grained pseudo-transaction where the call happens.
  2. Wrap this block in a try-catch.
  3. on exception, 'roll back' the pseudo-transaction.

Example of a typical workflow:

  • A get Azure queue message
  • B query data from SQL Azure
  • C process data,
  • D upload results
  • E delete message.

Wrap B through C together in a try-catch. If something happens during 'harmless' SQL Azure call, simply bail out without deleting the message, it will simply pop up again after visibility timeout expires.

Actually, this is very common approach: organize into transaction-like blocks, wrap block into try-catch, neatly roll back on exception. And never, never assume that some calls do not fail. All call fail from time to time.

Hemicellulose answered 7/12, 2012 at 18:49 Comment(1)
I agree with your approach, but David Steele's answer is also correct. In fact both answers deal with different levels. You use the Transient Fault Handling Framework to deal with transient errors; if the situation persist you abort the current operation and have it retried later (or discarded).Montagnard

© 2022 - 2024 — McMap. All rights reserved.