Client Socket Connections Being Denied By Server on Windows Host For Small Number (16 < x < 24) of Simulataneous Client Connection Attempts
Asked Answered
D

3

7

We are experiencing a problem where our incoming client socket connections to our socket server are being denied when a relatively small number of nodes (16 to 24, but we will need to handle more in the future) are trying to connect simultaneously.

Some specifics:

  • server is running on Windows 2008 or 7
  • our main server is written in Java using a ServerSocket
  • the clients are also Windows running on grid nodes in our data center

When we try and do a test run on the grid, the client nodes attempt to connect to the server and send a 40-100K packet and then drop the connection. Using between 16 and 24 nodes we start seeing problems with client connections failing to be able to connect to the server. Given this setup, we are trying to potentially handle a max of 16-24 simultaneous client connections and failing, which does not seem right to us at all.

The main server loop is listening on a regular SocketServer and when it gets a connection it spawns a new Thread to handle the connection, returning immediately to listen on the socket. We also have a dummy python server that simply reads and discards the incoming data and a C++ server that logs the data before dumping it and both are also experiencing the same problem with clients being unable to connect with minor variations in how many successful client connections before the failures start. This has lead us to believe that any specific server is not at fault in this issue and that it is probably environmental.

Our first thoughts were to up the TCP backlog on the socket. This did not alleviate the issue even when pushed to very high levels. The default for a Java SocketServer is 50, much lower than we are able to handle.

We have run the test between machines on the same subnet, and disabled all local firewalls on the machines in case the FW is doing rate limiting our connections to the server; no success.

We have tried some tuning of the network on the Windows machine running the servers:

  • Decreasing the TimedWaitDelay, but to no effect (and in my Python test it shouldn’t because that test only runs for a few milliseconds).
  • Increasing the MaxUserPort to a large value, around 65000, but to no effect (which is odd given my Python test only ever sends 240 messages, so I shouldn’ even be getting close to this type of limit).
  • Increasing the TcpNumConnection to a large value (can’t remember the exact number). Again, we should never have more than 24 connections at a time so this can’t be a limit.
  • Starting the “Dynamic Backlog” feature which allows the message backlog to increase dynamically. I think we set the max to 2000 connections with min 1000 connections, but to no effect. Again, Python should never make more than 240 connections so we shouldn’t be even activating the dynamic backlog.
  • In addition to the above disabling Windows “autotuning” for TCP ports. Again, to no effect.

My feeling is that Windows is somehow limiting the number of inbound connections but we aren't sure what to modify to allow a greater number of connections. The thoughts of an agent on the network limiting the connection rate also don't seem to be true. We highly doubt that the number of simultaneous connections is overloading the physical GB network.

We're stumped. Has anybody else experienced a problem like this and found a solution?

Deiform answered 4/7, 2013 at 19:28 Comment(2)
I am facing similar issue with Windows 7 professional edition. I have tried all the above mentioned steps. Tried out settings mentioned in smallvoid.com/article/winnt-tcpip-max-limit.html and kb.globalscape.com/KnowledgebaseArticle10438.aspx. Tried to disable SynAttackProtect (although that has no effect in Win7 as per msdn.microsoft.com/en-us/library/ee377058%28BTS.10%29.aspx). Were you able solve this issue?Quad
It is said that half open connection limit is removed in Win 7 but is there any limit for 'Inbound' half open connections?. I can successfully initiate 200 requests/sec to the server as long as it is back to back and NOT concurrent.Quad
M
1

I would check how many connections are in the TIME_WAIT state of the TCP connection. I have seen this type of problem due to many connections being open/closed causing socket exhaustion due to the TIME_WAIT. To check it, run:

netstat -a
Moncrief answered 4/7, 2013 at 19:46 Comment(0)
E
1

IIS is known to handle large numbers of concurrent incoming connections - much greater than the limit you are experiencing - making the environment an unlikely source.

If, as you indicate, increasing the TCP backlog does not improve the situation the problem really has to be in the accept() behaviour. You don't indicate if the clients receive various kinds of errors, or something consistent. Timeouts would support this, while rejections would indicate the backlog is not being processed fast enough.

Are you able to try to prototype the application as an ASPX host to better understand the problem?

Echt answered 1/12, 2013 at 16:48 Comment(2)
Increased the back log to 64. Initiated a burst of 20 connections to the server. Got connection refused for more than 10 requests every time i tested. I also tested using Hercules(hw-group.com/products/hercules/index_en.html) and found similar behaviour. Every time it is the same error ie RST indicating connection being refused.Quad
The RST combined with the backlog indicates something curious. Are you able to provide a network trace of the communications? How do you get the tool to generate the multiple connections?Echt
M
1

Most probably you are limited by the OS; do you see a 4226 error message in your system logs?

Windows limits the number of concurrent connection attempts to (I think) 10 connections/second - depending on the OS version (server versions have a value of up to 50)

In order to eliminate that, you have two possibilities:

  • directly edit tcpip.sys in system32/drivers with a hex editor - kidding :)

  • try to edit the [HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Services \Lanmanserver \Parameters\MaxMpxCt (Default = 10 commands) entry.

You may also try this hotfix in case you're using a version which does not allow you to set that parameter.

You can also try various things like max number of TCBs the OS uses, port range for dynamic port allocation, etc - although these values are high enough for your needs.

Morissa answered 3/12, 2013 at 10:22 Comment(2)
I found an article which describes a similar problem, with a file server: blogs.citrix.com/2010/10/21/… - I think the reason is the same.Morissa
I dont think it is related to socket connectionQuad

© 2022 - 2024 — McMap. All rights reserved.