How do you deal with transport-level errors in SqlConnection?
Asked Answered
M

11

33

Every now and then in a high volume .NET application, you might see this exception when you try to execute a query:

System.Data.SqlClient.SqlException: A transport-level error has occurred when sending the request to the server.

According to my research, this is something that "just happens" and not much can be done to prevent it. It does not happen as a result of a bad query, and generally cannot be duplicated. It just crops up maybe once every few days in a busy OLTP system when the TCP connection to the database goes bad for some reason.

I am forced to detect this error by parsing the exception message, and then retrying the entire operation from scratch, to include using a new connection. None of that is pretty.

Anybody have any alternate solutions?

Marpet answered 19/8, 2008 at 17:36 Comment(5)
Do you have statistics for the load on your database server when these errors are thrown? You might have some database issues that are causing connections to fail.Recalcitrate
This should not happen, even under high transactional volume. We run an average of 25,000 transactions per second on SQL Server 2005 Standard, and we don't get this error. (Unless the cluster fails over, which is every 12+ months, not every few days.) Without any more info, it sounds like there is a networking problem between your database server and your application servers. Can you post more info?Hereto
@Portman, I suspect it is due to the crappy onboard Dell NIC I'm forced to use since both of my PCIe slots are taken up with HBA cards connected to my DAS. I'm upgrading to a bigger machine so I can fit the (much) better Intel NIC. How are you clustering with Standard Edition? That's an Enterprise Edition feature.Marpet
clustering, log shipping, and mirroring are all available in Standard. http://www.microsoft.com/sql/prodinfo/features/compare-features.mspxHereto
As far as I can tell, class 20 is transport level.Locklear
P
9

I posted an answer on another question on another topic that might have some use here. That answer involved SMB connections, not SQL. However it was identical in that it involved a low-level transport error.

What we found was that in a heavy load situation, it was fairly easy for the remote server to time out connections at the TCP layer simply because the server was busy. Part of the reason was the defaults for how many times TCP will retransmit data on Windows weren't appropriate for our situation.

Take a look at the registry settings for tuning TCP/IP on Windows. In particular you want to look at TcpMaxDataRetransmissions and maybe TcpMaxConnectRetransmissions. These default to 5 and 2 respectively, try upping them a little bit on the client system and duplicate the load situation.

Don't go crazy! TCP doubles the timeout with each successive retransmission, so the timeout behavior for bad connections can go exponential on you if you increase these too much. As I recall upping TcpMaxDataRetransmissions to 6 or 7 solved our problem in the vast majority of cases.

Peroxidase answered 16/10, 2008 at 7:50 Comment(0)
A
5

This blog post by Michael Aspengren explains the error message "A transport-level error has occurred when sending the request to the server."

Anallise answered 29/1, 2010 at 7:20 Comment(1)
Updated link to blog post is: learn.microsoft.com/en-us/archive/blogs/spike/…Hydrolyse
K
2

To answer your original question:

A more elegant way to detect this particular error, without parsing the error message, is to inspect the Number property of the SqlException.

(This actually returns the error number from the first SqlError in the Errors collection, but in your case the transport error should be the only one in the collection.)

Klemperer answered 1/10, 2008 at 5:57 Comment(0)
B
1

I'm using reliability layer around my DB commands (abstracted away in the repository interfaece). Basically that's just code that intercepts any expected exception (DbException and also InvalidOperationException, that happens to get thrown on connectivity issues), logs it, captures statistics and retries everything again.

With that reliability layer present, the service has been able to survive stress-testing gracefully (constant dead-locks, network failures etc). Production is far less hostile than that.

PS: There is more on that here (along with a simple way to define reliability with the interception DSL)

Becki answered 1/10, 2008 at 4:53 Comment(0)
W
1

I have seen this happen in my own environment a number of times. The client application in this case is installed on many machines. Some of those machines happen to be laptops people were leaving the application open disconnecting it and then plugging it back in and attempting to use it. This will then cause the error you have mentioned.

My first point would be to look at the network and ensure that servers aren't on DHCP and renewing IP Addresses causing this error. If that isn't the case then you have to start trawlling through your event logs looking for other network related.

Unfortunately it is as stated above a network error. The main thing you can do is just monitor the connections using a tool like netmon and work back from there.

Good Luck.

Withrow answered 31/10, 2008 at 6:30 Comment(0)
J
1

I had the same problem albeit it was with service requests to a SQL DB.

This is what I had in my service error log:


System.Data.SqlClient.SqlException: A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.)


I have a C# test suite that tests a service. The service and DB were both on external servers so I thought that might be the issue. So I deployed the service and DB locally to no avail. The issue continued. The test suite isn't even a hard pressing performance test at all, so I had no idea what was happening. The same test was failing each time, but when I disabled that test, another one would fail continuously.

I tried other methods suggested on the Internet that didn't work either:

  • Increase the registry values of TcpMaxDataRetransmissions and TcpMaxConnectRetransmissions.
  • Disable the "Shared Memory" option within SQL Server Configuration Manager under "Client Protocols" and sort TCP/IP to 1st in the list.
  • This might occur when you are testing scalability with a large number of client connection attempts. To resolve this issue, use the regedit.exe utility to add a new DWORD value named SynAttackProtect to the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ with value data of 00000000.

My last resort was to use the old age saying "Try and try again". So I have nested try-catch statements to ensure that if the TCP/IP connection is lost in the lower communications protocol that it does't just give up there but tries again. This is now working for me, however it's not a very elegant solution.

Jariah answered 23/7, 2010 at 15:19 Comment(1)
Thanks for the feedback. If you are using connection pooling, try call SqlConnection.Recycle() every say 10 minutes to ensure that if SQLServer has killed a connection that your pool doesnt still try to make use of it. Should this work, report back!Curran
A
1

use Enterprise Services with transactional components

Akkadian answered 23/7, 2010 at 15:24 Comment(0)
G
0

You should also check hardware connectivity to the database.

Perhaps this thread will be helpful: http://channel9.msdn.com/forums/TechOff/234271-Conenction-forcibly-closed-SQL-2005/

Ginny answered 19/8, 2008 at 18:2 Comment(0)
E
0

I had the same problem. I asked my network geek friends, and all said what people have replied here: Its the connection between the computer and the database server. In my case it was my Internet Service Provider, or there router that was the problem. After a Router update, the problem went away. But do you have any other drop-outs of internet connection from you're computer or server? I had...

Enjambment answered 1/10, 2008 at 6:5 Comment(0)
M
0

I experienced the transport error this morning in SSMS while connected to SQL 2008 R2 Express.

I was trying to import a CSV with \r\n. I coded my row terminator for 0x0d0x0a. When I changed it to 0x0a, the error stopped. I can change it back and forth and watch it happen/not happen.

 BULK INSERT #t1 FROM 'C:\123\Import123.csv' WITH 
      ( FIRSTROW = 1, FIELDTERMINATOR = ',', ROWTERMINATOR = '0x0d0x0a' )

I suspect I am not writing my row terminator correctly because SQL parses one character at a time right while I'm trying to pass two characters.

Anyhow, this error is 4 years old now, but it may provide a bit of information for the next user.

Malachy answered 13/3, 2014 at 20:2 Comment(2)
I think the issue here was that the rowterminator is supposed to be a single binary value, which for SQL server would be written as 0x0d0a (no second 0x).Aksoyn
Hey! That's slick! I'll try that later tonight!Malachy
D
0

I just wanted to post a fix here that worked for our company on new software we've installed. We were getting the following error since day 1 on the client log file: Server was unable to process request. ---> A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The semaphore timeout period has expired.) ---> The semaphore timeout period has expired.

What completely fixed the problem was to set up a link aggregate (LAG) on our switch. Our Dell FX1 server has redundant fiber lines coming out of the back of it. We did not realize that the switch they're plugged into needed to have a LAG configured on those two ports. See details here: https://docs.meraki.com/display/MS/Switch+Ports#SwitchPorts-LinkAggregation

Deletion answered 3/12, 2015 at 16:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.