What 130 second timeout is killing my WCF streaming service call?
Asked Answered
S

5

18

Just recently I started to investigate a tricky problem with WCF streaming in which a CommunicationException is produced if the client waits for any longer than 130 seconds in between sends to the server.

Here is the full exception:

System.ServiceModel.CommunicationException was unhandled by user code
  HResult=-2146233087
  Message=The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '23:59:59.9110000'.
  Source=mscorlib
  StackTrace:
    Server stack trace: 
       at System.ServiceModel.Channels.HttpOutput.WebRequestHttpOutput.WebRequestOutputStream.Write(Byte[] buffer, Int32 offset, Int32 count)
       at System.IO.BufferedStream.Write(Byte[] array, Int32 offset, Int32 count)
       at System.Xml.XmlStreamNodeWriter.FlushBuffer()
       at System.Xml.XmlStreamNodeWriter.GetBuffer(Int32 count, Int32& offset)
       at System.Xml.XmlUTF8NodeWriter.InternalWriteBase64Text(Byte[] buffer, Int32 offset, Int32 count)
       at System.Xml.XmlBaseWriter.WriteBase64(Byte[] buffer, Int32 offset, Int32 count)
       at System.Xml.XmlDictionaryWriter.WriteValue(IStreamProvider value)
       at System.ServiceModel.Dispatcher.StreamFormatter.Serialize(XmlDictionaryWriter writer, Object[] parameters, Object returnValue)
       at System.ServiceModel.Dispatcher.OperationFormatter.OperationFormatterMessage.OperationFormatterBodyWriter.OnWriteBodyContents(XmlDictionaryWriter writer)
       at System.ServiceModel.Channels.Message.OnWriteMessage(XmlDictionaryWriter writer)
       at System.ServiceModel.Channels.TextMessageEncoderFactory.TextMessageEncoder.WriteMessage(Message message, Stream stream)
       at System.ServiceModel.Channels.HttpOutput.WriteStreamedMessage(TimeSpan timeout)
       at System.ServiceModel.Channels.HttpOutput.Send(TimeSpan timeout)
       at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.SendRequest(Message message, TimeSpan timeout)
       at System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout)
       at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
       at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
       at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)
    Exception rethrown at [0]: 
       at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
       at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
       at WcfService.IStreamingService.SendStream(MyStreamUpRequest request)
       at Client.Program.<Main>b__0() in c:\Users\jpierson\Documents\Visual Studio 2012\Projects\WcfStreamingTest\Client\Program.cs:line 44
       at System.Threading.Tasks.Task.Execute()
  InnerException: System.IO.IOException
       HResult=-2146232800
       Message=Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host.
       Source=System
       StackTrace:
            at System.Net.Sockets.NetworkStream.MultipleWrite(BufferOffsetSize[] buffers)
            at System.Net.ConnectStream.InternalWrite(Boolean async, Byte[] buffer, Int32 offset, Int32 size, AsyncCallback callback, Object state)
            at System.Net.ConnectStream.Write(Byte[] buffer, Int32 offset, Int32 size)
            at System.ServiceModel.Channels.BytesReadPositionStream.Write(Byte[] buffer, Int32 offset, Int32 count)
            at System.ServiceModel.Channels.HttpOutput.WebRequestHttpOutput.WebRequestOutputStream.Write(Byte[] buffer, Int32 offset, Int32 count)
       InnerException: System.Net.Sockets.SocketException
            HResult=-2147467259
            Message=An existing connection was forcibly closed by the remote host
            Source=System
            ErrorCode=10054
            NativeErrorCode=10054
            StackTrace:
                 at System.Net.Sockets.Socket.MultipleSend(BufferOffsetSize[] buffers, SocketFlags socketFlags)
                 at System.Net.Sockets.NetworkStream.MultipleWrite(BufferOffsetSize[] buffers)
            InnerException: 

It appears that the server has closed the connection prematurely due to inactivity on the connection. If I instead give a pulse to the server, even one byte at a time, then I never get this exception and I can continue to transfer data indefinitely. I've constructed a very simple example application to demonstrate this which uses basicHttpBinding with Streamed transferMode and I insert an artificial delay from within a custom stream implementation on the client that delays for 130 seconds. This simulates something similar to a buffer under-run condition in which the stream provided in my service call from the client is not feeding the data to the WCF infrastructure quick enough to satisfy some sort of unidentifiable timeout value that appears to be around the 130 second mark.

Using the WCF service tracing tools I'm able to find an HttpException with the a message that reads "The client is disconnected because the underlying request has been completed. There is no longer an HttpContext available."

From the IIS Express trace log file I see an entry that says "The I/O operation has been aborted because of either a thread exit or an application request. (0x800703e3)"

I've configured both server and client timeouts to use a value well over the 130 second mark just to rule them out. I've tried idleTimeout in IIS Express and a host of ASP.NET related timeout values too in order to discover where this issue is coming from but so far no luck. The best information I can find so far is a comment in the FireFox issue tracker by a developer that describes a similar problem working outside of the WCF architecture. For this reason I'm guessing the issue may be more related specifically to IIS7 or possibly Windows Server.

Custom binding on server Web.config

<binding name="myHttpBindingConfiguration"
         closeTimeout="02:00:00"
         openTimeout="02:00:00"
         receiveTimeout="02:00:00"
         sendTimeout="02:00:00">
  <textMessageEncoding messageVersion="Soap11" />
  <httpTransport maxBufferSize="65536"                        
                 maxReceivedMessageSize="2147483647"
                 maxBufferPoolSize="2147483647"
                 transferMode="Streamed" />
</binding>

Client side configuration in code:

    var binding = new BasicHttpBinding();
    binding.MaxReceivedMessageSize = _maxReceivedMessageSize;
    binding.MaxBufferSize = 65536;
    binding.ReaderQuotas.MaxStringContentLength = int.MaxValue;
    binding.ReaderQuotas.MaxArrayLength = int.MaxValue;
    binding.TransferMode = TransferMode.Streamed;
    binding.ReceiveTimeout = TimeSpan.FromDays(1);
    binding.OpenTimeout = TimeSpan.FromDays(1);
    binding.SendTimeout = TimeSpan.FromDays(1);
    binding.CloseTimeout = TimeSpan.FromDays(1);

In response to wals idea to try to see if I get any different results by self hosting my service I want to add that I did so and found that I get the same results as when hosting in IIS. What does this mean? My guess is that this means the issue is either in WCF or in the underlying networking infrastructure in Windows. I'm using Windows 7 64 bit and we've discovered this issue by running various clients and running the service portion on a Windows 2008 Server.

Update 2013-01-15

I found some new clues thanks to DarkWanderer once I realized that WCF uses HTTP.sys underneath in self-hosting scenarios on Windows 7. This got me looking into what I could configure for HTTP.sys and also what type of issues people are reporting that for HTTP.sys that sound similar to what I'm experiencing. This lead me to a log file located at C:\Windows\System32\LogFiles\HTTPERR\httperr1.log which appears to log specific types of HTTP issues on the part of HTTP.sys. In this log I see the following type of log entry each time I run my test.

2013-01-15 17:17:12 127.0.0.1 59111 127.0.0.1 52733 HTTP/1.1 POST /StreamingService.svc - - Timer_EntityBody -

So it's down to finding what conditions could cause a Timer_EntityBody error and what settings in IIS7 or elsewhere may have a bearing on when and if that error occurs.

From the official IIS webiste:

The connection expired before the request entity body arrived. When it is clear that a request has an entity body, the HTTP API turns on the Timer_EntityBody timer. Initially, the limit of this timer is set to the connectionTimeout value. Each time another data indication is received on this request, the HTTP API resets the timer to give the connection more minutes as specified in the connectionTimeout attribute.

Trying to modify the connectionTimeout attribute as the reference above suggests for in applicationhost.config for IIS Express doesn't seem to make any difference. Perhaps IIS Express ignores this configuration and uses a hard coded value internally? Trying something on my own I discovered that there are new netsh http commands added to show and add timeout values so giving that I go I came up with the following command but unfortunately doing so didn't seem to have any effect on this error either.

netsh http add timeout timeouttype=IdleConnectionTimeout value=300

Silhouette answered 10/1, 2013 at 20:39 Comment(8)
we will need your app.config file (both sides)Noggin
@Noggin - the config files are uninteresting at this point. I'm reproducing this in a simple test app but to satisfy the request I'll update my post.Silhouette
is that receiveTimeout 2 minutes or 2 hours? :)Noggin
try changing it to the max: 24.20:31:23.6470000Noggin
2:00:00 is 2 hours and increasing it made no difference.Silhouette
ok. can you try removing your assigned values for the bindings on the client and server side and seeing if you get any difference? (may not fix problem but may narrow it down). Also, can you run your WCF service in isolation (its own process) to rule out a few things?Noggin
@SebastianK - None more than what I have listed above in my post.Silhouette
I am setting them on the client and I'm doing it through code. See the section labeled "Client side configuration in code" in my post.Silhouette
S
16

It turns out that this issue was caused by the Connection Timeout value used by HTTP.sys and that can be specified through IIS Manager through the Advanced Settings for the individual site. This value is configured by default to timeout a connection when both the header and body haven't been received within 120 seconds. If a pulse of data from the body is received then the server restarts a timer (Timer_EntityBody) within the timeout value then the timer is reset to wait for additional data.

Connection Time-out setting in IIS

This is just as the documentation concerning Timer_EntityBody and connectionTimeout specifies, however it was hard to pinpoint because it appears that IIS Express ignores the connectionTimeout value specified in the limits element in applicationhost.config regardless of what the documentation says. In order to determine this I had to install the full version of IIS on my development machine and modify the setting above after hosting my site there.

Since we are hosting the real service under IIS on Windows 2008 the above solution will work for me however the question still remains on how to properly modify the Connection Timeout value in cases where you are Self Hosting.

Silhouette answered 15/1, 2013 at 23:45 Comment(4)
Yes, I tired 140 at first and then tested with a delay of 133 and 160 just to ensure that it would pass for the first and fail for the later and it did just that. I did further testing at 400 seconds and 600 as well just to confirm that the setting was indeed the value that affected this behavior.Silhouette
Does anyone know if MS intends to fix IIS Express to honor the connectionTimeout value in limits in applicationhost.config?Heady
@Silhouette Did you ever figure out how to increase the idle timeout for self-hosted services?Heady
I'm not self hosting in my scenario so no I haven't had to tackle that one. I would welcome any insight here on the matter though just in case any of us need to self host and run into a similar problem though.Silhouette
D
2

Judging from the error:

The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '23:59:59.9110000'

Looks like this is a simple TCP timeout.

You can verify it by running the application as self-hosted, and then running this command in console:

netstat -no |find "xxxxx"

where xxxxx is the PID of your server process. This command will show you the connections your server has established and refresh every second.

Try to connect with the client and see what happens. Most likely, you'll see "CLOSE_WAIT" or "TIME_WAIT" on your connection after around 100-120 seconds - which will mean it was aborted due to timeout.

This sould be fixable by adding the following to your config:

<httpTransport maxBufferSize="65536"                        
             maxReceivedMessageSize="2147483647"
             maxBufferPoolSize="2147483647"
             transferMode="Streamed"
             keepAliveEnabled="true" /> <!-- Here -->

The parameter is explained here

Dishonesty answered 15/1, 2013 at 9:50 Comment(4)
I also assumed at this point that the issue must be related to some lower level timeout such as in the underlying socket. Not sure yet at this point if the timeout value would be server-side or client-side though. Also the keepAliveEnabled value didn't seem to have any effect.Silhouette
Still trying to wrap my head around the output from netstat because it shows an unexpected PID as for the process that owns the open port in my tests. It's showing PID 4 which is associated with "System" ntoskrnl.exe and not my self hosting command line application.Silhouette
Well, whatever the PID investigation shows, result will be the same... I guess, keepAliveEnabled="true" has no effect with transferMode="Streamed", due to the nature of the streaming socket. If you confirm the reason outlined above, I'd recommend you settle with own "keepalives", as you described in the question.Dishonesty
It turns out that PID 4 is the Kernel. From what I can gather this is the normal way self hosting is done on Vista and up in that everything is hosted underneath HTTP.sys. I tried changing the the idleConnectionTimeout value for the HTTP.sys module by running "netsh http add timeout timeouttype=IdleConnectionTimeout value=300" but it didn't seem to make any difference.Silhouette
E
1

Probably long shot, but... Check your IIS app pool if it has Ping enabled. It's in advanced settings, Process Model grouping.

Earthshine answered 10/1, 2013 at 21:3 Comment(4)
I looked into this and unfortunately I'm reproducing this right now on IIS Express and setting the pingFrequency value in the applicationhost.config only caused IIS Express to crash upon startup. It appears to me that perhaps IIS Express doesn't have the same process model or process model configuration that full IIS has? Perhaps this also indicates that this is not the source of the issue since I'm able to reproduce this just the same regardless of whether I'm running the serivce in IIS7 or IIS Express.Silhouette
The documentation on the <processModel> element must have been outdated. I've found and tried using the attributes pingingEnabled, pingInterval, and pingResponseTime but unfortunately I haven't stumbled across a combination that works.Silhouette
yeah, probably unrelated then. I was thinking setting piningEnabled to false or increasing pingResponseTime would help.Earthshine
Thinking down the same lines those are the first things I tried after you brought those settings to my attention. Thanks for your input.Silhouette
A
0

Try this, for me it solved the problem. The problem is that the underlying kernel http.sys has it's own timeout and it will break the connection.

http://mmmreddy.wordpress.com/2013/07/11/wcf-use-of-http-transport-sharing-persistent-tcp-sessions/

netsh http add timeout timeouttype=idleconnectiontimeout value=120
Ananias answered 27/8, 2013 at 15:48 Comment(1)
See near the end of my question, I actually tried this setting to avail in my case.Silhouette
L
0

Did you consider http://support.microsoft.com/kb/946086 ?

I observed such streaming interrupts in my ISAPI-Extensions. After switching off buffering in IIS 7 according to this suppport note, everthing worked fined.

Liquefy answered 10/10, 2013 at 21:12 Comment(1)
No, I hadn't come across that KB article but to me it seems somewhat unrelated. I was very specifically running into a timeout issue as opposed to an issue that has to do with a buffer hitting the limits of it's capacity. I'm sure of this because the accepted answer which entails changing the timeout in IIS directly solved this problem.Silhouette

© 2022 - 2024 — McMap. All rights reserved.