A few points I'd like to touch on....
1) According to this document, here is what's necessary for using keepalive in Linux:
Linux has built-in support for keepalive. You need to enable TCP/IP
networking in order to use it. You also need procfs support and sysctl
support to be able to configure the kernel parameters at runtime.
The procedures involving keepalive use three user-driven variables:
tcp_keepalive_time
> the interval between the last data packet sent (simple ACKs are not
considered data) and the first keepalive probe; after the connection
is marked to need keepalive, this counter is not used any further
tcp_keepalive_intvl
> the interval between subsequential keepalive probes, regardless of
what the connection has exchanged in the meantime
tcp_keepalive_probes
> the number of unacknowledged probes to send before considering the
connection dead and notifying the application layer
Remember that keepalive support, even if configured in the kernel, is
not the default behavior in Linux. Programs must request keepalive
control for their sockets using the setsockopt interface. There are
relatively few programs implementing keepalive, but you can easily add
keepalive support for most of them following the instructions
explained later in this document.
Try to look at the current values for these variables in your current system to make sure they are correct or make sense. The bold highlight is mine and it seems you are doing that.
I assume the values for those variables are in milliseconds but not sure, you double-check.
tcp_keepalive_time
I would expect a value meaning something around 'ASAP after last data packet sent, send the first probe'
tcp_keepalive_intvl
I guess the value for this variable should be something lesser than the default time TCP takes to shut down the connection.
tcp_keepalive_probes
This could be the "magic value" that makes or breaks your application; if the number of unacknowledged probes is too high it could be the cause of epoll_wait()
never exititing.
The document discusses the Linux implementation of TCP keepalive in Linux kernel releases (2.4.x, 2.6.x) as well as how to write TCP keepalive-enabled applications in the C language.
http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/
2) Make sure you are not specifying -1 in the timeout argument in epoll_wait()
because it causes epoll_wait()
to block indefinitely.
int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);
The timeout argument specifies the minimum number of milliseconds that
epoll_wait() will block. (This interval will be rounded up to the
system clock granularity, and kernel scheduling delays mean that the
blocking interval may overrun by a small amount.) Specifying a timeout
of -1 causes epoll_wait() to block indefinitely, while specifying a
timeout equal to zero cause epoll_wait() to return immediately, even
if no events are available.
From the manual page http://linux.die.net/man/2/epoll_wait