Thread starvation on net.tcp binding - TCP error code 10061
Asked Answered
E

2

6

I have faced a very strange error in my WCF service, which appears to somehow create a deadlock or thread starvation in socket level when I use NetTcpBinding. I have a quite simple self-hosted service:

class Program
{
    static void Main(string[] args)
    {
        using (ServiceHost serviceHost = new ServiceHost(typeof(TestService)))
        {
            serviceHost.Open();             
            Console.WriteLine("Press <ENTER> to terminate service.");
            Console.ReadLine();
            serviceHost.Close();
        }
        Uri baseAddress = new Uri("net.tcp://localhost:8014/TestService.svc");         
    }
}

[ServiceContract]
public interface ITestService
{
    [OperationContract]
    string GetData(string data);
}

public class TestService: ITestService
{
    public string GetData(string data)
    {
        Console.WriteLine(data);
        Thread.Sleep(5000);
        return "Ok";
    }
}

The configuration part:

<system.serviceModel>
<bindings>
  <basicHttpBinding>
    <binding name="basicHttpBinding" closeTimeout="00:02:00" openTimeout="00:02:00"
      receiveTimeout="00:02:00" sendTimeout="00:02:00" maxBufferSize="2000000000"
      maxReceivedMessageSize="2000000000" />
  </basicHttpBinding>
  <netTcpBinding>
    <binding name="netTcpBinding" closeTimeout="00:02:00" openTimeout="00:02:00"
      receiveTimeout="00:02:00" sendTimeout="00:02:00" listenBacklog="2000"
      maxBufferSize="2000000000" maxConnections="1000" maxReceivedMessageSize="2000000000">
      <security mode="None">
        <transport protectionLevel="EncryptAndSign" />
      </security>
    </binding>
    <binding name="TestServiceTcpEndPoint">
      <security mode="None" />
    </binding>
  </netTcpBinding>      
</bindings>

<behaviors>
  <serviceBehaviors>
    <behavior name="CommonServiceBehavior">
      <serviceMetadata httpGetEnabled="true" />
      <serviceDebug includeExceptionDetailInFaults="true" />
      <serviceThrottling maxConcurrentCalls="1000" maxConcurrentSessions="1000" maxConcurrentInstances="1000" />
    </behavior>
  </serviceBehaviors>
</behaviors>
<services>
  <service name="ServiceLauncher.TestService" behaviorConfiguration="CommonServiceBehavior">
    <endpoint address="" binding="netTcpBinding" bindingConfiguration="netTcpBinding" name="TestServiceTcpEndPoint" contract="ServiceLauncher.ITestService" />
    <endpoint address="" binding="basicHttpBinding" bindingConfiguration="basicHttpBinding" name="TestServiceTcpEndPoint" contract="ServiceLauncher.ITestService" />
    <endpoint address="mex"  binding="mexHttpBinding" bindingName="mexHttpBinding" contract="IMetadataExchange" />
    <host>
      <baseAddresses>
        <add baseAddress="net.tcp://localhost:8014/TestService.svc"/>
        <add baseAddress="http://localhost:1234/TestService.svc"/>
      </baseAddresses>
    </host>
  </service>
</services>
</system.serviceModel>

And I have a client which consumes this service in many threads with creating new instance for every thread (it is a requirement):

    static void Main(string[] args)
    {           
        for (int i = 0; i < 1000; i++)
        {
            Thread tr = new Thread(() =>
            {
                using (var service = new Test.TestServiceClient())
                {
                    var result = service.GetData(i.ToString());
                    Console.WriteLine(string.Format("{0}: {1} {2}",
                                      DateTime.Now,
                                      result,
                                      Thread.CurrentThread.ManagedThreadId));
                }  
            });
            tr.Start();                
        }
        Console.ReadLine();       
    }

In this case after some requests client raises EndpointNotFoundException, TCP error code 10061, No connection could be made because the target machine actively refused it. The number of requests is different all the time, and it is not the server part because it still works in normal state. And I see it keeps recieving the requests, what is most strangest in this situation. What is also strange that it can make your client host "immortal" after the exception - so that you can't kill it by any mean, except of the reboot of the system. I'm pretty sure that the problem is in low socket level of the client, and it is somehow connected with such a large number of threads, but I didn't succeed in finding something which could explaine the problem.

Evy answered 8/1, 2016 at 16:17 Comment(9)
If you enable WPF Performance counters see what "Percent of Max Concurrent Calls", "Percent of Max Concurrent Instances", and "Percent of Max Concurrent Sessions" is. (see blogs.msdn.com/b/appfabriccat/archive/2010/10/29/… for more info.)Sesquioxide
@ScottChamberlain thanks for the tip, but I doubt that this will give me some useful info about service client, because this is where the source error and this part is not throttled. The service keeps working fine and throttles correctly even after the errorEvy
Misunderstood the orginal question. I thought the lockup was on the server side, not sure then.Sesquioxide
The service keeps working fine... Well, if you enable tracing on the service side you will see that it is actually throwing tons of exceptions.Modeling
@jstreet Yes, it throws internal exceptions, but "The Serivce keeps working fine" means that it is not in Faulted state and it keep responding to the other clients. And I suppose that the internal exceptions are caused but the failuer of socket on the client part, so that server just doesn't see where to send his answer.Evy
Trying using netstat and see if your app has too many ports open when the problem occurs.Grandma
If you launch the JIT Debugger on the client side, this is the exception you get: The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it is in the Faulted state.Modeling
In my machine, everything works fine up to 400 threads. Somewhere between 400-500 threads it starts throwing. Now, if you do away with that Thread.Sleep(5000) in your service implementation then it can handle even 1000 threads that you're using in your test client. The problem here seems to be that the artificial, excessive 5s delay is causing a lot of timeouts to start popping up and consequently ruining the communication channel.Modeling
@jstreet Yes, that is exactly what I'm expecting locally. I managed to create a fix with Semaphore, which allows to throttle the number of threads on my client and it seems to work fine, but I want to find the source of the problem which is probably in Socket level of tcp protocol. I want to find the settings which will allow me to increase that numberEvy
C
1

Every time I've seen the error "No connection could be made because the target machine actively refused it." the problem has not been with the service. Its usually a problem reaching the service.

A couple suggestions:

  1. Avoid using with WCF Proxies. You can pick from several reasonable work arounds.

  2. Read my answer to WCF performance, latency and scalability. Other than starting threads the old fashioned way, its basically the same test app. The post describes all the client causes (I could find) that cause “No connection could be made because the target machine actively refused it” and offers different WCF, TCP, and thread pool settings that can be adjusted.

Craftsman answered 12/1, 2016 at 6:47 Comment(1)
Thanks for the tips, but it's not actually what I'm looking for. The reason for "using" is just to call dispose of the another partial class to my proxy in real project, but I doubt it has affect on the issue. As for the second part - I tried throttling changes, tcp maxConnections and ListenBacklog properties, even parameters of tcpIp in system registry - no effect. I can't use thread pool (it solves the problem because of increasing delay between requests and limiting their number) because I use TPL and it can bring issues with deadlock at some points.Evy
D
0

You could be hitting the internal limits on concurrent TCP/IP connections in windows. Have a look at this article and see if it helps:

http://smallvoid.com/article/winnt-tcpip-max-limit.html

Dymphia answered 17/1, 2016 at 20:10 Comment(1)
Thanks for the link, but I've tried them before (see my comment to ErnieL answer). I've changed them all, but none of them resolved the issue. This is what I'm looking for actually - the setting in WCF, TCP, System Registry, which will allow me to use the binding in such multithreading scenario.Evy

© 2022 - 2024 — McMap. All rights reserved.