WCF timeout exception detailed investigation
Asked Answered
O

11

94

We have an application that has a WCF service (*.svc) running on IIS7 and various clients querying the service. The server is running Win 2008 Server. The clients are running either Windows 2008 Server or Windows 2003 server. I am getting the following exception, which I have seen can in fact be related to a large number of potential WCF issues.

System.TimeoutException: The request channel timed out while waiting for a reply after 00:00:59.9320000. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout. ---> System.TimeoutException: The HTTP request to 'http://www.domain.com/WebServices/myservice.svc/gzip' has exceeded the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout. 

I have increased the timeout to 30min and the error still occurred. This tells me that something else is at play, because the quantity of data could never take 30min to upload or download.

The error comes and goes. At the moment, it is more frequent. It does not seem to matter if I have 3 clients running simultaneously or 100, it still occurs once in a while. Most of the time, there are no timeouts but I still get a few per hour. The error comes from any of the methods that are invoked. One of these methods does not have parameters and returns a bit of data. Another takes in lots of data as a parameter but executes asynchronously. The errors always originate from the client and never reference any code on the server in the stack trace. It always ends with:

 at System.Net.HttpWebRequest.GetResponse()
  at System.ServiceModel.Channels.HttpChannelFactory.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)

On the server: I've tried (and currently have) the following binding settings:

maxBufferSize="2147483647" maxReceivedMessageSize="2147483647" maxBufferPoolSize="2147483647"

It does not seem to have an impact.

I've tried (and currently have) the following throttling settings:

<serviceThrottling maxConcurrentCalls="1500"   maxConcurrentInstances="1500"    maxConcurrentSessions="1500"/>

It does not seem to have an impact.

I currently have the following settings for the WCF service.

[ServiceBehavior(InstanceContextMode = InstanceContextMode.Single, ConcurrencyMode = ConcurrencyMode.Single)]

I ran with ConcurrencyMode.Multiple for a while, and the error still occurred.

I've tried restarting IIS, restarting my underlying SQL Server, restarting the machine. All of these don't seem to have an impact.

I've tried disabling the Windows firewall. It does not seem to have an impact.

On the client, I have these settings:

maxReceivedMessageSize="2147483647"

<system.net>
    <connectionManagement>
    <add address="*" maxconnection="16"/>
</connectionManagement> 
</system.net>

My client closes its connections:

var client = new MyClient();

try
{
    return client.GetConfigurationOptions();
}
finally
{
    client.Close();
}

I have changed the registry settings to allow more outgoing connections:

MaxConnectionsPerServer=24, MaxConnectionsPer1_0Server=32.

I have now just recently tried SvcTraceViewer.exe. I managed to catch one exception on the client end. I see that its duration is 1 minute. Looking at the server side trace, I can see that the server is not aware of this exception. The maximum duration I can see is 10 seconds.

I have looked at active database connections using exec sp_who on the server. I only have a few (2-3). I have looked at TCP connections from one client using TCPview. It usually is around 2-3 and I have seen up to 5 or 6.

Simply put, I am stumped. I have tried everything I could find, and must be missing something very simple that a WCF expert would be able to see. It is my gut feeling that something is blocking my clients at the low-level (TCP), before the server actually receives the message and/or that something is queuing the messages at the server level and never letting them process.

If you have any performance counters I should look at, please let me know. (please indicate what values are bad, as some of these counters are hard to decypher). Also, how could I log the WCF message size? Finally, are there any tools our there that would allow me to test how many connections I can establish between my client and server (independently from my application)

Thanks for your time!

Extra information added June 20th:

My WCF application does something similar to the following.

while (true)
{
   Step1GetConfigurationSettingsFromServerViaWCF(); // can change between calls
   Step2GetWorkUnitFromServerViaWCF();
   DoWorkLocally(); // takes 5-15minutes. 
   Step3SendBackResultsToServerViaWCF();
}

Using WireShark, I did see that when the error occurs, I have a five TCP retransmissions followed by a TCP reset later on. My guess is the RST is coming from WCF killing the connection. The exception report I get is from Step3 timing out.

I discovered this by looking at the tcp stream "tcp.stream eq 192". I then expanded my filter to "tcp.stream eq 192 and http and http.request.method eq POST" and saw 6 POSTs during this stream. This seemed odd, so I checked with another stream such as tcp.stream eq 100. I had three POSTs, which seems a bit more normal because I am doing three calls. However, I do close my connection after every WCF call, so I would have expected one call per stream (but I don't know much about TCP).

Investigating a bit more, I dumped the http packet load to disk to look at what these six calls where.

1) Step3
2) Step1
3) Step2
4) Step3 - corrupted
5) Step1
6) Step2

My guess is two concurrent clients are using the same connection, that is why I saw duplicates. However, I still have a few more issues that I can't comprehend:

a) Why is the packet corrupted? Random network fluke - maybe? The load is gzipped using this sample code: http://msdn.microsoft.com/en-us/library/ms751458.aspx - Could the code be buggy once in a while when used concurrently? I should test without the gzip library.

b) Why would I see step 1 & step 2 running AFTER the corrupted operation timed out? It seems to me as if these operations should not have occurred. Maybe I am not looking at the right stream because my understanding of TCP is flawed. I have other streams that occur at the same time. I should investigate other streams - a quick glance at streams 190-194 show that the Step3 POST have proper payload data (not corrupted). Pushing me to look at the gzip library again.

Ostosis answered 11/6, 2009 at 14:35 Comment(2)
Jason - did you ever solve this problem? Was it the DefaultConnectionLimit setting?Acetanilide
@JasonKealey - In contrast to many other questions, you cannot be accused of not trying for yourself before posting the question :) I love that your question is so detailed, and includes all the important details. The symptoms you describe looks very much like mine, so I hope the solution is the same as well :)Stupefy
T
52

If you are using .Net client then you may not have set

//This says how many outgoing connection you can make to a single endpoint. Default Value is 2
System.Net.ServicePointManager.DefaultConnectionLimit = 200;

here is the original question and answer WCF Service Throttling

Update:

This config goes in .Net client application may be on start up or whenever but before starting your tests.

Moreover you can have it in app.config file as well like following

<system.net>
    <connectionManagement>
      <add maxconnection = "200" address ="*" />
    </connectionManagement>
  </system.net>
Traveler answered 16/4, 2010 at 11:24 Comment(4)
This looks promising. I've included this to be tested during my next scalability test. It looks exactly like the kind of random setting that would make it crash :) Thanks for the pointer.Ostosis
@Jason: If you are server programmer you know how much important to maintain the scalability of server in your hands and also one who is currently suffering the concurrency problem even after using above. Please if you can look into the following question https://mcmap.net/q/225521/-wcf-network-cost in short i am suffering with 31ms latency between client and server and need to reduce it.Traveler
It only took a year, but I finally ran another stress test on the application with this flag set. The issue appears solved, so I am giving you the best answer. I would not be surprised that this was the last piece of the puzzle that was required but that all the other elements needed to be in place to ensure the error did not happen. Thanks very much!Ostosis
@Aris: In .net client applicaiton, on startup or where ever you set your global configuration, if you want to keep it configurable you can add it in config file as well like this <system.net> <connectionManagement> <add maxconnection = "200" address ="*" /> </connectionManagement> </system.net>Traveler
G
3

If you havn't tried it already - encapsulate your Server-side WCF Operations in try/finally blocks, and add logging to ensure they are actually returning.

If those show that the Operations are completing, then my next step would be to go to a lower level, and look at the actual transport layer.

Wireshark or another similar packet capturing tool can be quite helpful at this point. I'm assuming this is running over HTTP on standard port 80.

Run Wireshark on the client. In the Options when you start the capture, set the capture filter to tcp http and host service.example.com - this will reduce the amount of irrelevant traffic.

If you can, modify your client to notify you the exact start time of the call, and the time when the timeout occurred. Or just monitor it closely.

When you get an error, then you can trawl through the Wireshark logs to find the start of the call. Right click on the first packet that has your client calling out on it (Should be something like GET /service.svc or POST /service.svc) and select Follow TCP Stream.

Wireshark will decode the entire HTTP Conversation, so you can ensure that WCF is actually sending back responses.

Gilliette answered 17/6, 2009 at 17:5 Comment(6)
I have logging on the server - there is no error on that end. I am running WireShark right now to see what I can find. Given the high volume of traffic, it is going to be a pain to analyze but I will report back if I can find anything.Ostosis
I ran WireShark over the past six hours and collected some 60k frames. Only one exception was reported by this client today. I did see a TCP connection marked as RST (reset), apparently after sending the error email, which is probably WCF that is terminating the connection. I saved the payload (525k) to disk. I verified that there were 87 other invocations with similar sized payloads. I did see a few TCP retransmissions, but saw some in other calls as well (that did not fail). Starting to wonder about my networking hardware + cables.Ostosis
Even on a local network, the presence of a TCP Retransmits isn't necessarily bad. If it's possible to physically connect two of the end points to a single switch, then that might be worth a shot, but I wouldn't hold out hope that will fix it. If you can - create a very basic client application that just passes some traffic back and forth to your server, and nothing else. This can help eliminate any issue in your application that might be causing timeouts.Gilliette
Also, you mention seeing the TCP Reset packet -- had the server delivered any kind of response at that point (or perhaps was it waiting for more data)? Was there an appreciable delay between the RST and the previous packet?Gilliette
The server is remote. I am planning on creating a test environment locally to see if that helps. As for the RST, it was sent 34 seconds after the last of five TCP Retransmission. (1 to 8 second intervals between retransmissions). Does that give you any clues?Ostosis
So, if the server is sending a RST packet after 5x consecutive retransmissions with no Acknowledgements - then it seems your client is (for some reason) dropping packets. I'd definitely create a stand-alone client, even if it talks directly to the regular server. There could be any number of issues causing the clients to drop packets (other threads blocking execution, other apps hogging resources, failing hardware, network, firewalls, etc).Gilliette
Z
2

from: http://www.codeproject.com/KB/WCF/WCF_Operation_Timeout_.aspx

To avoid this timeout error, we need to configure the OperationTimeout property for Proxy in the WCF client code. This configuration is something new unlike other configurations such as Send Timeout, Receive Timeout etc., which I discussed early in the article. To set this operation timeout property configuration, we have to cast our proxy to IContextChannel in WCF client application before calling the operation contract methods.

Zeal answered 11/6, 2009 at 15:22 Comment(2)
I have tried this. Regardless of the timeout I put, it still times out but this makes no sense because the operation is not that long and because all the other clients that do the same queries function during this time.Ostosis
My tests proved that OperationTimeout simply overrides ReceiveTimeout from the config. Thus, it's of no use whatever.Dikdik
F
2

I'm having a very similar problem. In the past, this has been related to serialization problems. If you are still having this problem, can you verify that you can correctly serialize the objects you are returning. Specifically, if you are using Linq-To-Sql objects that have relationships, there are known serialization problems if you put a back reference on a child object to the parent object and mark that back reference as a DataMember.

You can verify serialization by writing a console app that serializes and deserializes your objects using the DataContractSerializer on the server side and whatever serialization methods your client uses. For example, in our current application, we have both WPF and Compact Framework clients. I wrote a console app to verify that I can serialize using a DataContractSerializer and deserialize using an XmlDesserializer. You might try that.

Also, if you are returning Linq-To-Sql objects that have child collections, you might try to ensure that you have eagerly loaded them on the server side. Sometimes, because of lazy loading, the objects being returned are not populated and may cause the behavior you are seeing where the request is sent to the service method multiple times.

If you have solved this problem, I'd love to hear how because I'm stuck with it too. I have verified that my issue is not serialization so I'm at a loss.

UPDATE: I'm not sure if it will help you any but the Service Trace Viewer Tool just solved my problem after 5 days of very similar experience to yours. By setting up tracing and then looking at the raw XML, I found the exceptions that were causing my serialization problems. It was related to Linq-to-SQL objects that occasionally had more child objects than could be successfully serialized. Adding the following to your web.config file should enable tracing:

<sharedListeners>
    <add name="sharedListener"
         type="System.Diagnostics.XmlWriterTraceListener"
         initializeData="c:\Temp\servicetrace.svclog" />
  </sharedListeners>
  <sources>
    <source name="System.ServiceModel" switchValue="Verbose, ActivityTracing" >
      <listeners>
        <add name="sharedListener" />
      </listeners>
    </source>
    <source name="System.ServiceModel.MessageLogging" switchValue="Verbose">
      <listeners>
        <add name="sharedListener" />
      </listeners>
    </source>
  </sources>

The resulting file can be opened with the Service Trace Viewer Tool or just in IE to examine the results.

Frants answered 20/8, 2009 at 14:55 Comment(0)
V
2

Are you closing the connection to the WCF service in between requests? If you don't, you'll see this exact timeout (eventually).

Veldaveleda answered 4/4, 2011 at 22:54 Comment(0)
W
0

Did you try using clientVia to see the message sent, using SOAP toolkit or something like that? This could help to see if the error is coming from the client itself or from somewhere else.

Wsan answered 15/6, 2009 at 12:27 Comment(2)
Do you know of any tools more recent than the deprecated SOAP toolkit that would make it easier for me to log this information in WCF calls?Ostosis
SOAP Toolkit is deprecatedWeinberg
F
0

I'm not a WCF expert but I'm wondering if you aren't running into a DDOS protection on IIS. I know from experience that if you run a bunch of simultaneous connections from a single client to a server at some point the server stops responding to the calls as it suspects a DDOS attack. It will also hold the connections open until they time-out in order to slow the client down in his attacks.

Multiple connection coming from different machines/IP's should not be a problem however.

There's more info in this MSDN post:

http://msdn.microsoft.com/en-us/library/bb463275.aspx

Check out the MaxConcurrentSession sproperty.

Finkelstein answered 16/6, 2009 at 7:10 Comment(1)
I feel that this is what is happening, from everything I have seen, however I have (on the server): <serviceThrottling maxConcurrentCalls="150" maxConcurrentInstances="150" maxConcurrentSessions="150"/> <serviceDebug includeExceptionDetailInFaults="true" /> Would there be any performance monitor or IIS log that I could monitor to see if this is happening?Ostosis
V
0

Did you check the WCF traces? WCF has a tendency to swallow exceptions and only return the last exception, which is the timeout that you're getting, since the end point didn't return anything meaningful.

Veracruz answered 16/6, 2009 at 8:31 Comment(2)
I tried SvcTraceViewer and the only exception it reported was the timeout (on the client). Nothing was reported on the server.Ostosis
Open all the options on the trace, you might not have all the trace options open. Also, check both the event trace and message trace files.Veracruz
C
0

You will also receive this error if you are passing an object back to the client that contains a property of type enum that is not set by default and that enum does not have a value that maps to 0. i.e enum MyEnum{ a=1, b=2};

Cretan answered 25/2, 2011 at 22:54 Comment(0)
B
0

I've just solved the problem.I found that the nodes in the App.config file have configed wrong.

<client>
<endpoint name="WCF_QtrwiseSalesService" binding="wsHttpBinding" bindingConfiguration="ws" address="http://cntgbs1131:9005/MyService/TGE.ISupplierClientManager" contract="*">
</endpoint>
</client>

<bindings>
    <wsHttpBinding>
        <binding name="ws" maxBufferPoolSize="2147483647" maxReceivedMessageSize="2147483647" messageEncoding="Text">
            <readerQuotas maxDepth="2147483647" maxStringContentLength="2147483647" maxArrayLength="2147483647" maxBytesPerRead="2147483647" maxNameTableCharCount="2147483647"/>
            <**security mode="None">**
                <transport clientCredentialType="None"></transport>
            </security>
        </binding>
    </wsHttpBinding>
</bindings>

Confirm your config in the node <security>,the attribute "mode" value is "None". If your value is "Transport",the error occurs.

Breathing answered 18/11, 2011 at 7:31 Comment(1)
Does this not affect security ? If so this may not be a solution for most real applicationsRataplan
C
0

Looks like this exception message is quite generic and can be received due to a variety of reasons. We ran into this while deploying the client on Windows 8.1 machines. Our WCF client runs inside of a windows service and continuously polls the WCF service. The windows service runs under a non-admin user. The issue was fixed by setting the clientCredentialType to "Windows" in the WCF configuration to allow the authentication to pass-through, as in the following:

      <security mode="None">
        <transport clientCredentialType="Windows" proxyCredentialType="None"
          realm="" />
        <message clientCredentialType="UserName" algorithmSuite="Default" />
      </security>
Cinnabar answered 13/6, 2014 at 15:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.