WCF + NetTcp: high load make the channel stop working (calls/second rate)
Asked Answered
F

2

3

First of all, sorry, i'm not fluent.

I'm trying to figure out why my WCF services stop working when we have an environment with high calls/second rate. I'm not sure that just increasing timeout will solve the issue.

We have 2 webservices:

  • The first is hosted on IIS 7.5, Windows Server 2008 R2 Enterprise SP1 x64, with AppFabric (and WAS)
  • Second, hosted on Windows Service, Windows 2003 R2 SP1 x86

Both webservices have minimum configuration: No authentication, No trasaction, Without special treating of message.. check the binding:

<netTcpBinding>
    <binding  transactionFlow="false">
      <security mode="None">
        <message clientCredentialType="None" />
        <transport clientCredentialType="None"></transport>
      </security>
      <reliableSession enabled="false"/>
    </binding>
  </netTcpBinding>

We are trying to use Net.Tcp binding because of its realibility and velocity.

FACT 1 - Net.Tcp Binding is primary reason

When the load is high, the channel Net.Tcp stop working. That's it! But the BasicHttp still working like a charm.

The WindowsService: the channel net.tcp last down for some minutes (3m - 10m) before get working back (BY ITSELF, without we change anything. Goblins are working hard).

The AppFabric/IIS/WAS: the channel net.tcp keep down. Need manual restart.

The BasicHttpBinding configuration is similar to net.tcp: without any treating of the message, whitout security concerns or something like that.

FACT 2 - Without any kind of logging

We couldn't find any kind, tip, trick to figure out what's happening. I have tried Dump the memory, event logs, System.Diagnostics and nothing relevant. The most relevant tip is an Error from SMSvcHost 4.0.0.0:

An error occurred while dispatching a duplicated socket: this handle is now leaked in the process. ID: 2272 Source: System.ServiceModel.Activation.TcpWorkerProcess/62875109 Exception: System.TimeoutException: This request operation sent to http://schemas.microsoft.com/2005/12/ServiceModel/Addressing/Anonymous did not receive a reply within the configured timeout (00:01:00). The time allotted to this operation may have been a portion of a longer timeout. This may be because the service is still processing the operation or because the service was unable to send a reply message. Please consider increasing the operation timeout (by casting the channel/proxy to IContextChannel and setting the OperationTimeout property) and ensure that the service is able to connect to the client.

Server stack trace: at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at System.ServiceModel.Channels.ServiceChannel.SendAsyncResult.End(SendAsyncResult result) at System.ServiceModel.Channels.ServiceChannel.EndCall(String action, Object[] outs, IAsyncResult result) at System.ServiceModel.Channels.ServiceChannelProxy.InvokeEndService(IMethodCallMessage methodCall, ProxyOperationRuntime operation) at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)

Exception rethrown at [0]: at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
at System.ServiceModel.Activation.WorkerProcess.EndDispatchSession(IAsyncResult result) Process Name: SMSvcHost Process ID: 1532

Do you have any tip or configuration trick to help me solve this issue?

Whats the best configuration for high load scenarios?

Frostbite answered 11/9, 2012 at 21:4 Comment(10)
Only a comment but my guess is IIS is queueing HTTP and not TCPUrinate
Have you tried WCF trace? msdn.microsoft.com/en-us/library/ms733025.aspxSmukler
How did you generate heavy load ? and how many load ? (10 calls/s, 100 calls/s, 500 calls/s, ... ). How long take your method ? as schglurps said, you may have reached the limits of max concurrent calls/sessions/instances in WCF.Sialoid
@Blam you're right. If the WCF have all avaible connections using, the next one will fail. Do you know how make the NetTcpBinding behave same way as HttpBinding?Frostbite
@hugh WCF trace is useless because the port don't accept new connectionsFrostbite
I think IIS is queueing HTTP for you as it looks at it as web traffic. On TCP look at tuning you endpoint. What is you instancing? A lot of people stay away from Per Call thinking it is not scalable but that is typically the wrong conclusion. Since you have none it is probably defaulting to Session.Urinate
@Sialoid The load details: around 200 calls/s 20 users concurrent. The service keep up for just 10 seconds. I'll try increase max concurrent calls. I've found a Microsoft link for optmization of net.tcp binding: (msdn.microsoft.com/en-us/library/ee377061(v=bts.10).aspx)Frostbite
I found these links valuable #10746924Urinate
@Blam We are using the 3 types of instancing (we have different scenarios for each service/application). For us the "perfect" scenario would if IIS could queue TCP just like HTTP. But we don't know how do it :-\... Net.MSMQ hasn't Request-Response MEP. If we want reduce connection troubles the best Binding is BasicHttp?Frostbite
No I don't think HTTP is your answer as eventually even that queue will will fill up. TCP is more efficient. There is just a lot to it and this is not my expertise. If you have some long calls then consider asynch. Try and determine if one of the methods is causing the problem. Are you sure you need Single? If that one gets behind then things go bad fast.Urinate
L
3

If you generated a service reference in Visual Studio, or with the svcutil tool, make sure you always call the Close or Abort methods of your proxies. I encountered a similar problem some days ago because I forgot to call these methods.

Lenticularis answered 12/9, 2012 at 9:42 Comment(5)
You're right! A new consumer wasn't "aborting"/closing the connections. Do you (or someone) know if other technologies (like Java/Php) for consumers can handle the abort/close properly? Thanks a Lot. Next 2 weeks we will check all consumers of our services. If after this change we don't experience more issues related I'll mark this answer as right. Really thanks a lot.Frostbite
I don't think a Java/PHP client can do that. The best option is probably to create a new BasicHttpBinding endpoint of your service, and to say to your Java/PHP clients to use this specific endpoint.Lenticularis
You're right. I have done performance tests against my windows service net.tcp enables wcf recently. The difference when we use and don't the Close() method is noticeable. Thanks for the tip!!Frostbite
I had done the changes and after some performance tests, checking the performance counters for WCF (msdn.microsoft.com/en-us/library/ms735098.aspx) I could notice that it was improved. The proxy objects (service instances and connection related objects) aren't correctly disposed if you won't close/abort the connections (when using net.tcp binding) (http binding has a "built-in" connection control because of HTTP protocol requirements).I had done the changes and after some performance tests, checking the performance counters for WCF (msdn.microsoft.com/en-us/library/ms735098.aspx)Frostbite
I could notice that it was improved. The proxy objects (service instances and connection related objects) aren't correctly disposed if you won't close/abort the connections (when using net.tcp binding) (http binding has a "built-in" connection control because of HTTP protocol requirements).Frostbite
B
0

In case you are calling the Close() and Abort() methods accordingly and still receive this error consider the following scenario:

  1. You run a Microsoft .NET Framework 3.0-based or .NET Framework 3.5-based Windows Communication Foundation (WCF) service.

  2. The WCF service uses the Net.Tcp Port Sharing Service (Smsvchost.exe) and is hosted on a computer that is running Internet Information Services (IIS).

  3. One of the following conditions is true:

    • The CPU usage is high on the computer that is running IIS.
    • A throttle occurs in a service model for the WCF service.
    • Multiple requests are sent to the WCF service at the same time.

In this scenario, the WCF service takes longer than one minute to process a request from a client application. Additionally, an error message that assembles the following event entry is logged in the event log:

Log Name: System

Source: SMSvcHost 3.0.0.0

Date:

Event ID: 8

Task Category: Sharing Service

Level: Error

Keywords: Classic

User: LOCAL SERVICE

Computer:

Description: An error occurred while dispatching a duplicated socket: this handle is now leaked in the process.

ID: 2620

Source: System.ServiceModel.Activation.TcpWorkerProcess

Exception:

System.TimeoutException: This request operation sent to did not receive a reply within the configured timeout (00:01:00). The time allotted to this operation may have been a portion of a longer timeout. This may be because the service is still processing the operation or because the service was unable to send a reply message. Please consider increasing the operation timeout (by casting the channel/proxy to IContextChannel and setting the OperationTimeout property) and ensure that the service is able to connect to the client.

Note: You must restart IIS to recover the WCF service from this issue.

Cause:

This issue occurs because of the Smsvchost.exe process times out after one minute when it tries to transfer an incoming connection request to the W3wp.exe worker process. Additionally, this time-out is not configurable.

When the CPU has a heavy workload, or when many concurrent connection requests are incoming, the Smsvchost.exe process cannot transfer the incoming connection to the W3wp.exe worker process within one minute. Therefore, the Smsvchost.exe process times out and eventually stops responding. When this issue occurs, the Smsvchost.exe process cannot route later requests to the W3wp.exe worker process until IIS is restarted.

Solution:

Microsoft suggests applying the hot fix 2504602 that is described in Microsoft Knowledge Base (KB) article. This hot fix is available for WCF in the .NET Framework 3.0 SP2, in the .NET Framework 3.5 SP1 and the .NET Framework 4.

In addition, Microsoft claims to have solved this issue in the .Net Framework 4.5, therefore, you should upgrade to the latest version.

In case you upgrade to the .Net Framework 4.5 and the problem persists the workaround is to modify the smsvchost.exe.config file to increase timeout and pending accepts and various other parameters.

Bambi answered 6/2, 2020 at 22:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.