Async connect and disconnect with epoll (Linux)
Asked Answered
P

4

8

I need an async connect and disconnect for a TCP client using epoll on Linux. There are ext. functions in Windows, such as ConnectEx, DisconnectEx, AcceptEx, etc. For the TCP server, the standard accept function is working. However, the TCP client doesn't connect and disconnect correctly. All sockets are nonblocking. How do I get this working?

Platitudinous answered 17/4, 2012 at 8:0 Comment(5)
This could help you : #2875502Pluton
As a possible alternative to the suggestions on the linked DJB page, I'd like to suggest trying to dup and close the descriptor (and use the duplicate). Not tested, but it should work, in my understanding. The docs state that it is a serious programming error not to check the return value of close, because it may return a previous error. That's just what you want (if close gives an error, connect failed). Though of course if you use epoll then you're guaranteed to have an OS where getsockopt(SO_ERROR) will just work...Inspired
If viable, the simplest option is to wait until after connect() returns before you set NON_BLOCK.Schrimsher
@goldilocks: +1 Not asynchronous unless you use a worker thread for that, but I agree the simplicity is tempting. Plus, DNS resolve -- which you likely need -- will need a worker thread anyway unless you want to block on that (getattrinfo_a does just that internally, too). So while you block in the worker anyway, you can as well block on connect, too...Inspired
I have 1 work thread for all my needs (tcp server/client, udp socket, timerfd). In this thread I'm using epoll for async work. So I wait for epoll_wait(...) and then do what I need. Forexample: if socket is listening socket - I call accept function, create new client with this socket and add it to epoll queue. But in tcpclient - I can't add it to epoll before connect done... And if I do this - client connects several times (3-4)...Platitudinous
L
40

To do a non-blocking connect(), assuming the socket has already been made non-blocking:

int res = connect(fd, ...);
if (res < 0 && errno != EINPROGRESS) {
    // error, fail somehow, close socket
    return;
}

if (res == 0) {
    // connection has succeeded immediately
} else {
    // connection attempt is in progress
}

For the second case, where connect() failed with EINPROGRESS (and only in this case), you have to wait for the socket to be writable, e.g. for epoll specify that you're waiting for EPOLLOUT on this socket. Once you get notified that it's writable (with epoll, also expect to get an EPOLLERR or EPOLLHUP event), check the result of the connection attempt:

int result;
socklen_t result_len = sizeof(result);
if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &result, &result_len) < 0) {
    // error, fail somehow, close socket
    return;
}

if (result != 0) {
    // connection failed; error code is in 'result'
    return;
}

// socket is ready for read()/write()

In my experience, on Linux, connect() never immediately succeeds and you always have to wait for writability. However, for example, on FreeBSD, I've seen non-blocking connect() to localhost succeeding right away.

Lattice answered 17/4, 2012 at 16:7 Comment(3)
@Matt I know it does, you're probably doing something wrong here. What exactly are you trying, where is it failing? Have you put the socket into non-blocking mode using fcntl?Lattice
Oops, never mind. A bug on my part. Yeah I had non-blocking sockets. I had a bug when checking the result of connect! classic c mistake.Kindhearted
Just as an FYI: If non-blocking connect fails, Windows will notify via an exceptional condition rather than socket writability. msdn.microsoft.com/en-us/library/windows/desktop/ms740141.aspxDelimitate
V
3

From experience, when detect non-blocking connection , epoll is a little different from select and poll.

with epoll:

After connect() call is made, check return code.

If the connection can not be completed immediately, then register EPOLLOUT event with epoll.

Call epoll_wait().

if the connection failed, your events will be fill with EPOLLERR or EPOLLHUP, otherwise EPOLLOUT will be triggered.

Vaunt answered 15/8, 2013 at 9:14 Comment(1)
Yeah, I forgot to mention in my answer that epoll can return EPOLLERR or EPOLLHUP in addition to EPOLLOUT. Thanks for mentioning, it's corrected.Lattice
F
1

I have a "complete" answer here in case anyone else is looking for this:

#include <sys/epoll.h>
#include <errno.h>
....
....
int retVal = -1;
socklen_t retValLen = sizeof (retVal);

int status = connect(socketFD, ...);
if (status == 0)
 {
   // OK -- socket is ready for IO
 }
else if (errno == EINPROGRESS)
 {
    struct epoll_event newPeerConnectionEvent;
    int epollFD = -1;
    struct epoll_event processableEvents;
    unsigned int numEvents = -1;

    if ((epollFD = epoll_create (1)) == -1)
    {
       printf ("Could not create the epoll FD list. Aborting!");
       exit (2);
    }     

    newPeerConnectionEvent.data.fd = socketFD;
    newPeerConnectionEvent.events = EPOLLOUT | EPOLLIN | EPOLLERR;

    if (epoll_ctl (epollFD, EPOLL_CTL_ADD, socketFD, &newPeerConnectionEvent) == -1)
    {
       printf ("Could not add the socket FD to the epoll FD list. Aborting!");
       exit (2);
    }

    numEvents = epoll_wait (epollFD, &processableEvents, 1, -1);

    if (numEvents < 0)
    {
       printf ("Serious error in epoll setup: epoll_wait () returned < 0 status!");
       exit (2);
    }

    if (getsockopt (socketFD, SOL_SOCKET, SO_ERROR, &retVal, &retValLen) < 0)
    {
       // ERROR, fail somehow, close socket
    }

    if (retVal != 0) 
    {
       // ERROR: connect did not "go through"
    }   
}
else
{
   // ERROR: connect did not "go through" for other non-recoverable reasons.
   switch (errno)
   {
     ...
   }
}
Fatuous answered 5/6, 2012 at 21:19 Comment(3)
I believe your error check after the epoll_wait() is incorrect - you should always check the result of the connection attempt via getsockopt(SO_ERROR), even if you didn't get EPOLLERR. See EINPROGRESS in the man page linux.die.net/man/2/connect Also, assert() is the wrong way to handle critical errors - it would mean that you have proven that it can never happen. Use exit() instead, which will terminate the program even when NDEBUG is defined.Lattice
Just added the edits suggested. The un-edited version seems to work for me.Fatuous
Adding -1 as the timeout, shouldn't the epoll_wait block indefinitely in the above program?Nellenelli
W
1

I have tried the Sonny's solution and the epoll_ctl will return invalid argument. So i think maybe the right way to do this is as follow:

1.create socketfd and epollfd

2.use epoll_ctl to associate the socketfd and epollfd with epoll event.

3.do connect(socketfd,...)

4.check the return value or errno

5.if errno == EINPROGRESS, do epoll_wait

Waring answered 3/5, 2014 at 7:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.