Unblock recvfrom when socket is closed
Asked Answered
P

4

10

Let's say I start a thread to receive on a port. The socket call will block on recvfrom. Then, somehow in another thread, I close the socket.

On Windows, this will unblock recvfrom and my thread execution will terminate.

On Linux, this does not unblock recvfrom, and as a result, my thread is sitting doing nothing forever, and the thread execution does not terminate.

Can anyone help me with what's happening on Linux? When the socket is closed, I want recvfrom to unblock

I keep reading about using select(), but I don't know how to use it for my specific case.

Poky answered 17/6, 2011 at 18:12 Comment(0)
Y
21

Call shutdown(sock, SHUT_RDWR) on the socket, then wait for the thread to exit. (i.e. pthread_join).

You would think that close() would unblock the recvfrom(), but it doesn't on linux.

Yirinec answered 1/11, 2011 at 21:19 Comment(3)
SHUT_RD is sufficient.Tenorrhaphy
It isn't, and cannot possibly be, guaranteed to unblock recvfrom on any system. There is an inherent race that can only be solved by synchronization in user space.Interventionist
shutdown(sock, SHUT_RDWR) worked perfectly in linux.... thanksHornblende
T
6

Here's a sketch of a simple way to use select() to deal with this problem:

// Note: untested code, may contain typos or bugs
static volatile bool _threadGoAway = false;

void MyThread(void *)
{
   int fd = (your socket fd);
   while(1)
   {
      struct timeval timeout = {1, 0};  // make select() return once per second

      fd_set readSet;
      FD_ZERO(&readSet);
      FD_SET(fd, &readSet);

      if (select(fd+1, &readSet, NULL, NULL, &timeout) >= 0)
      {
         if (_threadGoAway)
         {
            printf("MyThread:  main thread wants me to scram, bye bye!\n");
            return;
         }
         else if (FD_ISSET(fd, &readSet))
         {
            char buf[1024];
            int numBytes = recvfrom(fd, buf, sizeof(buf), 0);
            [...handle the received bytes here...]
         }
      }
      else perror("select");
   }
}

// To be called by the main thread at shutdown time
void MakeTheReadThreadGoAway()
{
   _threadGoAway = true;
   (void) pthread_join(_thread, NULL);   // may block for up to one second
}

A more elegant method would be to avoid using the timeout feature of select, and instead create a socket pair (using socketpair()) and have the main thread send a byte on its end of the socket pair when it wants the I/O thread to go away, and have the I/O thread exit when it receives a byte on its socket at the other end of the socketpair. I'll leave that as an exercise for the reader though. :)

It's also often a good idea to set the socket to non-blocking mode also, to avoid the (small but non-zero) chance that the recvfrom() call might block even after select() indicated the socket is ready-to-read, as described here. But blocking mode might be "good enough" for your purpose.

Tinderbox answered 18/6, 2011 at 0:37 Comment(8)
And I remind you again that the "accepted" answer for that question is simply wrong. Every Unix -- from the original BSD and SYS V onward -- has guaranteed that read() will never block after select() says the socket is ready. And so does the POSIX spec. If Linux behaves differently, that is a bug in Linux. You should not encourage people to pollute their code with garbage to cater to broken systems.Camphorate
Hi Nemo, if you are saying that the bug in Linux is fixed now and therefore my caveat is misleading, that is one thing. If OTOH you are saying the bug is still present, but no one must ever speak of it, then you are only setting people up to fail. Keeping people ignorant of Linux's (mis)behavior won't prevent the problem from biting them, it will only prevent them from being able to plan in advance for how they want to deal with the issue.Tinderbox
A fair point. I think there is a non-zero chance this bug has been fixed, because to violate POSIX so blatantly for no good reason is stupid, and the Linux developers are not stupid. I will try asking on the linux-kernel mailing list. And I apologize if my tone was overly hostile; I had a bad day...Camphorate
@Camphorate Where can I find this guarantee? And what clause of POSIX do you think it violates? That sounds insane to me. That would mean, for example, that a network protocol that allows you to "cancel" unread data can't be supported by POSIX, which sounds like an extraordinary claim. My understanding is that select is supposed to be protocol neutral and that you couldn't get a guarantee that a read wouldn't block without considering the rules of the protocol. For example, why can't a UDP implementation drop a datagram after it triggered a read hit from select? What POSIX rule is violated?Interventionist
@Camphorate Where does this scenario violate POSIX? 1. UDP checksums are disabled. 2. A datagram with an invalid checksum is received on a connected socket. 3. A select call unblocks because a read would not have blocked. 4. Checksum enforcement is enabled. 5. A read call now blocks because no datagram with a valid checksum has been received.Interventionist
@DavidSchwartz: Obviously if you do something to the socket between the select and the read (e.g. enable checksum enforcement, or heck just another call to read) you might change its "ready for reading" property. What we are talking about is select saying the socket is ready for reading and then an immediate read blocking. No Unix has ever allowed this, but Linux does... And it clearly violates POSIX (see pubs.opengroup.org/onlinepubs/9699919799/functions/select.html and search for "considered ready").Camphorate
@Camphorate Enabling or disabling UDP checksum enforce is not an operation on the socket, it's an operation on the connection. And there is no way you can possibly know whether or not something else did something to the connection since the connection is shared. So something you have no control over (an operation on the shared connection) can change the ready for reading property. And you are misreading POSIX. It's not saying an actual future read won't block, it's saying a hypothetical concurrent read would not block.Interventionist
@Camphorate It's no different from a "get free space" function saying that free space means a write would not return a "no space" error. It doesn't mean there's a guarantee that an actual future write won't return a "no space" error because the implementation does not have complete control over the disk space and does not freeze the world for you. Same with a connection.Interventionist
I
4

Not an answer, but the Linux close man page contains the interesting quote:

It is probably unwise to close file descriptors while they may be in use by system calls in other threads in the same process. Since a file descriptor may be reused, there are some obscure race conditions that may cause unintended side effects.

Inkling answered 17/6, 2011 at 18:25 Comment(2)
That's why I said "somehow in another thread, I close the socket... this case needs to be handled and I don't want the possibility of my program hanging forever because of a thread blocked on a socket waiting on recvfrom when its closedPoky
@Poky This case does not need to be handled because code that is otherwise correct cannot possibly trigger this case.Interventionist
I
4

You are asking for the impossible. There is simply no possible way for the thread that calls close to know that the other thread is blocked in recvfrom. Try to write code that guarantees that this happens, you will find that it is impossible.

No matter what you do, it will always be possible for the call to close to race with the call to recvfrom. The call to close changes what the socket descriptor refers to, so it can change the semantic meaning of the call to recvfrom.

There is no way for the thread that enters recvfrom to somehow signal to the thread that calls close that it is blocked (as opposed to being about to block or just entering the system call). So there is literally no possible way to ensure the behavior of close and recvfrom are predictable.

Consider the following:

  1. A thread is about to call recvfrom, but it gets pre-empted by other things the system needs to do.
  2. Later, the thread calls close.
  3. A thread started by the system's I/O library calls socket and gets the same decsriptor as the one you closed.
  4. Finally, the thread calls recvfrom, and now it's receiving from the socket the library opened.

Oops.

Don'd ever do anything even remotely like this. A resource must not be released while another thread is, or might be, using it. Period.

Interventionist answered 13/6, 2017 at 16:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.