Load balancer scalability and max #tcp ports

TCP and UDP port numbers are 16-bit, so a given IP has only 65535 of them (port 0 is not valid I believe). But a TCP connection is identified by the 4-tuple (source IP, source port, destination IP, destination port). (Looks like wikipedia has links to learn more.)

For the client->balancer requests: as long as each inbound connection has a distinct (source IP, source port), there's no problem. And the client normally ensures this. The only problems on this side I recall hearing of were with an extremely popular website with many images per page when accessed from enormous ISPs which NAT their customers behind very few IPv4 addresses. That's probably not your situation.

The balancer->backend requests are more interesting, as you're probably creating a similar situation to the NAT problem I mentioned above. I think Linux normally tries to assign a distinct ephemeral port to each socket, and by default there are only 28,233 of those. And IIRC it doesn't use ones in the TIME_WAIT state either so you can exhaust the range without actually having that many connections open simultaneously. IIRC if you hit this limit you'll get an EADDRINUSE failure on connect (or on bind if you explicitly bind the socket prior to connect). I don't remember exactly how I've gotten around this before, much less the absolute best way, but here are a few things that may help:

keeping persistent balancer->backend connections rather than creating a new one for each (probably short-lived) client->balancer connection.
setting SO_REUSEADDR on the sockets prior to bind/connect.
turning on the sysctl net.ipv4.tcp_tw_reuse and/or net.ipv4.tcp_tw_recycle.
explicitly picking the source IP and/or port to use via bind rather than letting the kernel autoassign on connect. You can't have two simultaneous connections with the same 4-tuple but anything else is fine. (Exception: I'm punting on thinking through whether TIME_WAIT reuse for the same 4-tuple is okay; I'd have to refresh my memory about TIME_WAIT by reading through some TCP RFCs.)

You'll probably have to do a bit of experimentation. The good news is that once you understand the problem, it's pretty easy to reproduce it and test to see if you've fixed it.

Recommended topics

Hot tags