What is the cost of many TIME_WAIT on the server side?
Asked Answered
V

6

68

Let's assume there is a client that makes a lot of short-living connections to a server.

If the client closes the connection, there will be many ports in TIME_WAIT state on the client side. Since the client runs out of local ports, it becomes impossible to make a new connection attempt quickly.

If the server closes the connection, I will see many TIME_WAITs on the server side. However, does this do any harm? The client (or other clients) can keep making connection attempts since it never runs out of local ports, and the number of TIME_WAIT state will increase on the server side. What happens eventually? Does something bad happen? (slowdown, crash, dropped connections, etc.)

Please note that my question is not "What is the purpose of TIME_WAIT?" but "What happens if there are so many TIME_WAIT states on the server?" I already know what happens when a connection is closed in TCP/IP and why TIME_WAIT state is required. I'm not trying to trouble-shoot it but just want to know what is the potential issue with it.

To put simply, let's say netstat -nat | grep :8080 | grep TIME_WAIT | wc -l prints 100000. What would happen? Does the OS's network stack slow down? "Too many open files" error? Or, just nothing to worry about?

Vulgarity answered 26/11, 2009 at 13:5 Comment(2)
Some systems see problems upon "32K TIME_WAIT" serverfault.com/a/212127/87017Crosscheck
For linux there's a paper based on data via Webstone Benchmark. Also "The TIME-WAIT state in TCP and its effect on busy servers".Crosscheck
I
66

Each socket in TIME_WAIT consumes some memory in the kernel, usually somewhat less than an ESTABLISHED socket yet still significant. A sufficiently large number could exhaust kernel memory, or at least degrade performance because that memory could be used for other purposes. TIME_WAIT sockets do not hold open file descriptors (assuming they have been closed properly), so you should not need to worry about a "too many open files" error.

The socket also ties up that particular src/dst IP address and port so it cannot be reused for the duration of the TIME_WAIT interval. (This is the intended purpose of the TIME_WAIT state.) Tying up the port is not usually an issue unless you need to reconnect a with the same port pair. Most often one side will use an ephemeral port, with only one side anchored to a well known port. However, a very large number of TIME_WAIT sockets can exhaust the ephemeral port space if you are repeatedly and frequently connecting between the same two IP addresses. Note this only affects this particular IP address pair, and will not affect establishment of connections with other hosts.

Interlanguage answered 6/12, 2009 at 2:56 Comment(2)
Are you sure this is relevant to Windows Server and other OS too?Crosscheck
It is worth mentioning that by setting option SO_REUSEADDR for a new socket, you can use ports occupied with socket(s) in TIME_WAIT state.Piperidine
T
17

Each connection is identified by a tuple (server IP, server port, client IP, client port). Crucially, the TIME_WAIT connections (whether they are on the server side or on the client side) each occupy one of these tuples.

With the TIME_WAITs on the client side, it's easy to see why you can't make any more connections - you have no more local ports. However, the same issue applies on the server side - once it has 64k connections in TIME_WAIT state for a single client, it can't accept any more connections from that client, because it has no way to tell the difference between the old connection and the new connection - both connections are identified by the same tuple. The server should just send back RSTs to new connection attempts from that client in this case.

Touchandgo answered 26/11, 2009 at 22:58 Comment(1)
exactly what is happening in my caseMustee
V
13

Findings so far:

Even if the server closed the socket using system call, its file descriptor will not be released if it enters the TIME_WAIT state. The file descriptor will be released later when the TIME_WAIT state is gone (i.e. after 2*MSL seconds). Therefore, too many TIME_WAITs will possibly lead to 'too many open files' error in the server process.

I believe OS TCP/IP stack has been implemented with proper data structure (e.g. hash table), so the total number of TIME_WAITs should not affect the performance of the OS TCP/IP stack. Only the process (server) which owns the sockets in TIME_WAIT state will suffer.

Vulgarity answered 26/11, 2009 at 14:25 Comment(3)
Not sure that is true. I have produced hundreds of TIME_WAIT but didn't see the number of opened file descriptor increases in sysctl fs.file-nr.Slavin
@c4il, @ trustin, Why is everyone discussing this without stating.. which OS? Specific versions would be helpful too.Crosscheck
@Vulgarity : What was the reason for many open file descriptors, Did you find it ?Grammatical
D
2

If you have a lot of connections from many different client IPs to the server IPs you might run into limitations of the connection tracking table.

Check:

sysctl net.ipv4.netfilter.ip_conntrack_count
sysctl net.ipv4.netfilter.ip_conntrack_max

Over all src ip/port and dest ip/port tuples you can only have net.ipv4.netfilter.ip_conntrack_max in the tracking table. If this limit is hit you will see a message in your logs "nf_conntrack: table full, dropping packet." and the server will not accept new incoming connections until there is space in the tracking table again.

This limitation might hit you long before the ephemeral ports run out.

Decussate answered 29/8, 2016 at 11:48 Comment(0)
M
0

In my scenario i ran a script which schedules files repeatedly,my product do some computations and sends response to client ie client is making a repetitive http call to get the response of each file.When around 150 files are scheduled socket ports in my server goes in time_wait state and an exception is thrown in client which opens a http connection ie

 Error : [Errno 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted

The result was that my application hanged.I do not know may be threadshave gone in wait state or what has happened but i need to kill all processes or restart my application to make it work again.

I tried reducing wait time to 30 seconds since it is 240 seconds by default but it did not work.

So basically overall impact was critical as it made my application non-responsive

Mustee answered 2/5, 2018 at 5:33 Comment(0)
C
-2

it looks like the server can just run out of ports to assign for incoming connections (for the duration of existing TIMED_WAITs) - a case for a DOS attack.

Cephalization answered 26/11, 2009 at 13:9 Comment(7)
Why does the server run out of ports? Server does not allocate a local port for an accepted connection. That's why a server can handle 100k concurrent connection, putting the busy CPU problem aside.Vulgarity
The does does allocate a local port for accepted connections, run a 'netstat -a' from a command prompt and you will see these. I believe the reason for TIME_WAIT is that TCP packets can arrive in the wrong order, so the port must not be closed immediately to allow late packets to arrive. This means it is, indeed, possible to run out of ports. There are ways to shorten the TIME_WAIT period but the risk is that with shorter timeouts then late arriving packets from a previous connection can be mistaken for packets from the new connection on the recycled port.Tatter
If you run 'netstat -nat', you will see the connections accepted by the same server socket have the same local port. Hence I guess no extra local ports are assigned for accepted connections?Vulgarity
Also, if the server allocates a local port, the server should not be able to accept more than 64k concurrent connections. However, that is not true.Vulgarity
@Trustin Lee: turns out you are right - a tcp connection in uniquely identified by a 4-tuple (server address, server port, client address, client port) so port exhaustion will apply only to a (pretty contrived) case when client ip and port are the same for all connections.Cephalization
not that contrived when one heavily used app server is connecting to the same database to handle requests.Haplite
The server does not allocate a new port for incoming connections to a listening socket: it uses the same port as the listening socket, and this is clearly visible in any netstat display. The TIME_WAIT state cannot therefore cause the server to run out of sockets.Bufordbug

© 2022 - 2024 — McMap. All rights reserved.