How to gracefully handle accept() giving EMFILE and close the connection?
Asked Answered
P

4

10

When a process runs out of file descriptors, accept() will fail and set errno to EMFILE. However the underlying connection that would have been accepted are not closed, so there appears to be no way to inform the client that the application code could not handle the connection.

The question is what is the proper action to take regarding accepting TCP connections when running out of file descriptors.

The following code demonstrates the issue that I want to learn how to best deal with(note this is just example code for demonstrating the issue/question, not production code)

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>


static void err(const char *str)
{
    perror(str);
    exit(1);
}


int main(int argc,char *argv[])
{
    int serversocket;
    struct sockaddr_in serv_addr;
    serversocket = socket(AF_INET,SOCK_STREAM,0);
    if(serversocket < 0)
        err("socket()");

    memset(&serv_addr,0,sizeof serv_addr);

    serv_addr.sin_family = AF_INET;
    serv_addr.sin_addr.s_addr= INADDR_ANY;
    serv_addr.sin_port = htons(6543);
    if(bind(serversocket,(struct sockaddr*)&serv_addr,sizeof serv_addr) < 0)
        err("bind()");

    if(listen(serversocket,10) < 0)
        err("listen()");

    for(;;) {
        struct sockaddr_storage client_addr;
        socklen_t client_len = sizeof client_addr;
        int clientfd;

        clientfd = accept(serversocket,(struct sockaddr*)&client_addr,&client_len);
        if(clientfd < 0)  {
            continue;
        }

    }

    return 0;
}

Compile and run this code with a limited number of file descriptors available:

gcc srv.c
ulimit -n 10
strace -t ./a.out 2>&1 |less

And in another console, I run

 telnet localhost 65432 &

As many times as needed until accept() fails:

The output from strace shows this to happen:

13:21:12 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
13:21:12 bind(3, {sa_family=AF_INET, sin_port=htons(6543), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
13:21:12 listen(3, 10)                  = 0
13:21:12 accept(3, {sa_family=AF_INET, sin_port=htons(43630), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 4
13:21:19 accept(3, {sa_family=AF_INET, sin_port=htons(43634), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 5
13:21:22 accept(3, {sa_family=AF_INET, sin_port=htons(43638), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 6
13:21:23 accept(3, {sa_family=AF_INET, sin_port=htons(43642), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 7
13:21:24 accept(3, {sa_family=AF_INET, sin_port=htons(43646), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 8
13:21:26 accept(3, {sa_family=AF_INET, sin_port=htons(43650), sin_addr=inet_addr("127.0.0.1")}, [128->16]) = 9
13:21:27 accept(3, 0xbfe718f4, [128])   = -1 EMFILE (Too many open files)
13:21:27 accept(3, 0xbfe718f4, [128])   = -1 EMFILE (Too many open files)
13:21:27 accept(3, 0xbfe718f4, [128])   = -1 EMFILE (Too many open files)
13:21:27 accept(3, 0xbfe718f4, [128])   = -1 EMFILE (Too many open files)
 ... and thousands upon thousands of more accept() failures.

Basically at this point:

  • the code will call accept() as fast as possible failing to accept the same TCP connection over and over again, churning CPU.
  • the client will stay connected, (as the TCP handshake completes before the application accepts the connection) and the client gets no information that there is an issue.

So,

  1. Is there a way to force the TCP connection that caused accept() to fail to be closed (so e.g. the client can be quickly informed and perhaps try another server )

  2. What is the est practice to prevent the server code to go into an infinite loop when this situation arises (or to prevent the situation altogether)

Paly answered 8/11, 2017 at 12:29 Comment(3)
are you sure this is really the problem? elixir.free-electrons.com/linux/latest/source/net/… seems that the accept is done only once the allocation is done (and file limits checks) thus your probelm seems strange to meImeldaimelida
@Imeldaimelida he went so far as to provide a working example of the problem ... in the code you link, if the allocation fails then the accept syscall returns an error, does it not?Skitter
OznOg You confuse accept()ing a connection with completing the three-way-handshake. As I elaborated in my answer below, the client cannot tell, when the server accepted the connection (if at all), since the three-way-handshake is completed immediately by the OS (if the backlog still has space left)Zelaya
R
4

You can set aside an extra fd at the beginning of your program and keep track of the EMFILE condition:

int reserve_fd;
_Bool out_of_fd = 0;

if(0>(reserve_fd = dup(1)))
    err("dup()");

Then, if you hit the EMFILE condition, you can close the reserve_fd and use its slot to accept the new connection (which you'll then immediately close):

clientfd = accept(serversocket,(struct sockaddr*)&client_addr,&client_len);
if (out_of_fd){
    close(clientfd);
    if(0>(reserve_fd = dup(1)))
        err("dup()");
    out_of_fd=0;

    continue; /*doing other stuff that'll hopefully free the fd*/
}

if(clientfd < 0)  {
    close(reserve_fd);
    out_of_fd=1;
    continue;
}

Complete example:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>


static void err(const char *str)
{
    perror(str);
    exit(1);
}


int main(int argc,char *argv[])
{
    int serversocket;
    struct sockaddr_in serv_addr;
    serversocket = socket(AF_INET,SOCK_STREAM,0);
    if(serversocket < 0)
        err("socket()");
    int yes;
    if ( -1 == setsockopt(serversocket, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) )
        perror("setsockopt");


    memset(&serv_addr,0,sizeof serv_addr);

    serv_addr.sin_family = AF_INET;
    serv_addr.sin_addr.s_addr= INADDR_ANY;
    serv_addr.sin_port = htons(6543);
    if(bind(serversocket,(struct sockaddr*)&serv_addr,sizeof serv_addr) < 0)
        err("bind()");

    if(listen(serversocket,10) < 0)
        err("listen()");

    int reserve_fd;
    int out_of_fd = 0;

    if(0>(reserve_fd = dup(1)))
        err("dup()");


    for(;;) {
        struct sockaddr_storage client_addr;
        socklen_t client_len = sizeof client_addr;
        int clientfd;


        clientfd = accept(serversocket,(struct sockaddr*)&client_addr,&client_len);
        if (out_of_fd){
            close(clientfd);
            if(0>(reserve_fd = dup(1)))
                err("dup()");
            out_of_fd=0;

            continue; /*doing other stuff that'll hopefully free the fd*/
        }

        if(clientfd < 0)  {
            close(reserve_fd);
            out_of_fd=1;
            continue;
        }

    }

    return 0;
}

If you're multithreaded, then I imagine you'd need a lock around fd-producing functions and take it when you close the extra fd (while expecting to accept the final connection) in order to prevent having the spare slot filled by another thread.

All this should only makes sense if 1) the listening socket isn't shared with other processes (which might not have hit their EMFILE limit yet) and 2) the server deals with persistent connections (because if it doesn't, then you're bound to close some existing connection very soon, freeing up a fd slot for your next attempt at accept).

Reconvert answered 8/11, 2017 at 14:55 Comment(0)
Z
2

Problem

You cannot accept client connections, if the maximum number of file descriptors is reached. This can be a process limit (errno EMFILE) or a global system limit (errno ENFILE). The client does not immediately notice this situation and it looks to him like the connection was accepted by the server. If too many such connections pile up on the socket (when the backlog runs full), the server will stop sending syn-ack packets and the connection request will time out at the client (which can be quite an annoying delay)

Number of file descriptors

It is of course possible, to extend both limits when they are hit. For the process wide limit, use setrlimit(RLIMIT_NOFILE, ...), for the system wide limit sysctl() is the command to call. Both may require root privileges, the first one only to rise the hard limit.

However, there usually is a good reason for the file descriptor limit to prevent overusage of system resources, so this will not be a solution for all situations.

Recovering from EMFILE

One option is to implement a sleep(n) after EMFILE is received, one second should be enough to prevent additional system load by calling accept() too often. This may be useful to handle short bursts of connections.

However, if the situation doesn't normalize soon, other measures should be taken (for example, if sleep() had to be called 5 times in a row or similar).

In this case it is advisable to close the server socket. All pending client connections will be terminated immediately (by receiving a RST packet) and the clients can use another server if applicable. Furthermore, no new client connections are accepted, but immediately rejected (Connection Refused) instead of timing out as it might happen when the socket is held open.

After the contention releases, the server socket can be opened again. For the EMFILE case it is only necessary to track the number of open client connections and re-open the server socket, when these fall below some threshold. In the system-wide case, there is not a general answer for that, maybe just try after some time or use the /proc filesystem or system tools like lsof to find out when the contention ceases.

Zelaya answered 8/11, 2017 at 14:3 Comment(2)
Note that the sleep solution requires that your program is multi-threaded, i.e. you are relying on another thread eventually freeing a file descriptor.Skitter
@Skitter Yes, a call to sleep() is not necessarily the way to go, this was just meant to illustrate the concept... If the server is single threaded, you can keep the listening fd out of the select()/poll()-readfd-set for some time for exampleZelaya
S
2

One solution I've read about is to keep a "spare" file descriptor handy that you can use to accept and immediately close new connections when you're over fd capacity. For example:

int sparefd = open("/dev/null", O_RDONLY);

Then, when accept returns with EMFILE, you can:

close(sparefd); // create an available file descriptor
int newfd = accept(...);  // accept a new connection
close(newfd);   // immediately close the connection
sparefd = open("/dev/null", O_RDONLY);  // re-create spare

It's not exactly elegant, but it's probably a little better than closing the listening socket in some circumstances. Be wary that if your program is multi-threaded then another thread might "claim" the spare fd as soon as you release it; there's no easy way to solve that (the "hard" way is to put a mutex around every operation that might consume a file descriptor).

Skitter answered 8/11, 2017 at 14:53 Comment(6)
Where exactly do you see the advantage to closing the listening socket?Zelaya
@Zelaya well, especially in a multi-threaded program, if you close the listening socket then you run the risk of not being able to open it again. It's tenuous though, I'll grant.Skitter
(perhaps the advantage is that you can tell from the client end what is wrong. I.e. if the connection is refused then the server program may have crashed; if the connection is accepted but immediately closed you know that it's a response to high load).Skitter
Or that the server keeps constantly segfaulting... ;)Zelaya
@Zelaya True. :) Another case: you have a multi-process server with each process listening to the same socket. Closing it achieves nothing in that case and it would be particularly difficult to re-open.Skitter
(I don't actually believe that multi-process servers are the right solution for any problem in the modern age, but some people do. It's true that they're often a little easier to write than multi-threaded servers).Skitter
L
0

One solution is to split your server into two processes, one actually serving the connections and the other one just accepting the connections and transferring them to the first one over the UNIX domain socket. First process should signal to the second one when it is ready to accept more connections and if the second process knows that first process does not accept more connections currently, it should quickly process the connection, for example by sending back an error and closing it.

Lonergan answered 2/8 at 15:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.