Linux, sockets, non-blocking connect
Asked Answered
R

3

26

I want to create a non-blocking connect. Like this:

socket.connect(); // returns immediately

For this, I use another thread, an infinite loop and Linux epoll. Like this(pseudocode):

// in another thread
{
  create_non_block_socket();
  connect();

  epoll_create();
  epoll_ctl(); // subscribe socket to all events
  while (true)
  {
    epoll_wait(); // wait a small time(~100 ms)
    check_socket(); // check on EPOLLOUT event
  }
}

If I run a server and then a client, all it works. If I first run a client, wait a some small time, run a server, then the client doesn't connect.

What am I doing wrong? Maybe it can be done differently?

Renowned answered 21/7, 2013 at 7:14 Comment(3)
If you are raising another thread to perform the connect, why are you doing it asynchronous? Also, may as well put the rest of the comms in there.Genovese
Well, how to do it without epoll and nonblocking? If I just call connect() then it will block and wait for connect(am I right?). But then if I want to join this connecting thread to main thread, I can't to do it, because connecting thread will in blocking state. Sorry if I am wrong.Renowned
This is not 'async'. This is non-blocking.Shalna
I
55

You should use the following steps for an async connect:

  • create socket with socket(..., SOCK_NONBLOCK, ...)
  • start connection with connect(fd, ...)
  • if return value is neither 0 nor EINPROGRESS, then abort with error
  • wait until fd is signalled as ready for output
  • check status of socket with getsockopt(fd, SOL_SOCKET, SO_ERROR, ...)
  • done

No loops - unless you want to handle EINTR.

If the client is started first, you should see the error ECONNREFUSED in the last step. If this happens, close the socket and start from the beginning.

It is difficult to tell what's wrong with your code, without seeing more details. I suppose, that you do not abort on errors in your check_socket operation.

Ineptitude answered 21/7, 2013 at 8:39 Comment(13)
I know this is an old comment, but I just wanted to note that I had to wait for read in order to catch ETIMEDOUT. This occurred when the SYN response was not returned. If I only waited for write then the socket would disappear from netstat (from SYN_SENT state) but I'd get no notification that the socket was writable to call getsockopt and find ETIMEDOUT. I also added a call immediately after connect to getsockopt to see if there were any immediate errors available before polling.Kerakerala
@DreamWarrior: That's weird. Take a look at connect(2) and connect(3) and search for poll. Both man pages state, that you should wait for indication, that the socket is writable. Can you prodive a minimal example, that shows the unexpected behavior?Ineptitude
the man page states "It is possible to select(2) or poll(2) for completion by selecting the socket for writing". My guess is the key word is "completion". Since it was never completed, as it never received a SYN-ACK (or RST which completes the handshake, but results in failure), it never became writable.Kerakerala
I was testing this by performing a non-blocking connect to port 10000 on 1.1.1.1. However, my code was using the Xt scheduler (via XtAppAddInput w/ XtInputWriteMask) to perform the select/poll, so I'm not sure which it used, I just know the write event never "fired". A read event, added with XtInputReadMask, did fire when the TCP stack timed out waiting for the SYN-ACK. In this case, getsockopt returned ETIMEDOUT. I do wonder if there are other errors that would only be sent to the read event, but I don't know how to provoke them; I can only test ECONNREFUSED and ETIMEDOUT.Kerakerala
@DreamWarrior: I can't reproduce the problem you have described. I have written a minimal test program, and it correctly reports ETIMEDOUT using POLLOUT.Ineptitude
Interesting, your test program works as expected. So, the only thing I can figure is that the Xt scheduler that I am forced to use to schedule the I/O into my legacy application is not firing the events properly. Super odd -- I wish I had more time to investigate.Kerakerala
the extra getsockopt for SO_ERROR is critical and not well documented (or shown in any example I had seen). Poll will return a truthy value for writeable even though the ECONNREFUSED was hit and the socket isn't writeableSmock
This is not an 'async connect'.This is a non-blocking connect. Given that the program is doing exactly nothing except waiting for success or failure, the approach is completey futile. It would be more to the point to do the connect in blocking mode and then revert to non-blocking for whatever follows, if anything.Shalna
When getsockopt(fd, SOL_SOCKET, SO_ERROR, ...) returns 0, with 0 in so_error, this does not mean that the socket is connected. This means no error occured until now. In this specific case, you need to call getpeername() and if getpeername() returns 0, this means the socket is connected. If the socket is not connected, getpeername() returns -1 with ENOTCONN in errno. getsockopt(fd, SOL_SOCKET, SO_ERROR, ...) can inform you about a connection refused, but not about a connected socket. You need to use getpeername() or other means to be sure the socket is connected.Selectivity
I feel dumb asking this, but could you elaborate on how to go about this step: "wait until fd is signalled as ready for output"? Would that be done using select?Conveyancer
@Conveyancer That is correct.Shalna
@AlexandreFenyo getpeername() can also set errno to EINVAL ("socket has been shut down") instead of ENOTCONN if a socket previously existed with the same fd number and was closed. Just a little heads up if anyone is checking geteername errrno values in code.Abbey
@ShayanShahsiah Yes, but to be more accurate, it is not relative to the fd number, but to the underlying Linux kernel socket representation (a vnode on a BSD/Unix system): if you have a fd numbered 5 associated to a socket, you may create another fd numbered 6 by calling dup2(5, 6) and you can close fd numbered 5 and see that getpeername on fd number 6 will return an error of type ENOTCONN.Selectivity
P
9

There are a few ways to test if a nonblocking connect succeeds.

  1. call getpeername() first, if it failed with error ENOTCONN, the connection failed. then call getsockopt with SO_ERROR to get the pending error on the socket
  2. call read with a length of 0. if the read failed, the connection failed, and the errno for read indicates why the connection failed; read returns 0 if connection succeeds
  3. call connect again; if the errno is EISCONN, the connection is already connected and the first connect succeeded.

Ref: UNIX Network Programming V1

Pangenesis answered 12/4, 2019 at 9:40 Comment(1)
Please, note: the read() man page says: "If count is zero, read() may detect the errors described below. In the absence of any errors, or if read() does not check for errors, a read() with a count of 0 returns zero and has no other effects." So, it MAY detect the errors.Heraclid
G
3

D. J. Bernstein gathered together various methods how to check if an asynchronous connect() call succeeded or not. Many of these methods do have drawbacks on certain systems, so writing portable code for that is unexpected hard. If anyone want to read all the possible methods and their drawbacks, check out this document.

For those who just want the tl;dr version, the most portable way is the following:

Once the system signals the socket as writable, first call getpeername() to see if it connected or not. If that call succeeded, the socket connected and you can start using it. If that call fails with ENOTCONN, the connection failed. To find out why it failed, try to read one byte from the socket read(fd, &ch, 1), which will fail as well but the error you get is the error you would have gotten from connect() if it wasn't non-blocking.

Gerhardt answered 14/7, 2020 at 9:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.