How do I use EPOLLHUP
Asked Answered
H

1

24

Could you guys provide me a good sample code using EPOLLHUP for dead peer handling? I know that it is a signal to detect a user disconnection but not sure how I can use this in code..Thanks in advance..

Hindrance answered 22/6, 2011 at 9:43 Comment(1)
(took the liberty to add epoll and linux tags, since the Q relates to that)Dished
D
47

You use EPOLLRDHUP to detect peer shutdown, not EPOLLHUP (which signals an unexpected close of the socket, i.e. usually an internal error).

Using it is really simple, just "or" the flag with any other flags that you are giving to epoll_ctl. So, for example instead of EPOLLIN write EPOLLIN|EPOLLRDHUP.

After epoll_wait, do an if(my_event.events & EPOLLRDHUP) followed by whatever you want to do if the other side closed the connection (you'll probably want to close the socket).

Note that getting a "zero bytes read" result when reading from a socket also means that the other end has shut down the connection, so you should always check for that too, to avoid nasty surprises (the FIN might arrive after you have woken up from EPOLLIN but before you call read, if you are in ET mode, you'll not get another notification).

Dished answered 22/6, 2011 at 10:9 Comment(12)
Thanks for your kind answer. It was extremely hard for me to find good references about advanced networkings. All tutorials or examples online just covered the basic stuff and didn't cover any error handling. Could you please tell me any good references or tutorials or anything related to the advanced networking including books ? Thanks in advance.Hindrance
Also, is there a good way to handle data corruption. For example, let's say a client sent 100 bytes data but the server received only 90 bytes. The server is going to wait forever it receives the last 10 bytes which won't arrive. Right now, I am using select with timeout to handle this. but I was wondering there might be a way to handle this using epoll. Thanks again..Hindrance
I don't know any good tutorials on error handling, I'm checking the relevant return codes listed in the man pages. What's relevant depends on the situation. For example, if the socket call fails, what else is there to do but output a message and exit. On the other hand, read can (and will often) fail, which is a perfectly normal thing if the error is EAGAIN. Just read the "return value" and "error codes" sections of the man pages.Dished
About data loss which you refer to as "corruption", this will not happen. If you use TCP, the data will eventually arrive, and if you use UDP, you will not receive the 90 bytes either. TCP simulates a stream with in-order guaranteed delivery on top of the packet-based IP protocol. Due to the data arriving in packets, you will often not get all data at once, but you will get it, eventually (keep reading). UDP offers an unreliable whole-message-or-nothing service. See also: https://mcmap.net/q/582502/-epoll-exception-handle/…Dished
Damon, could you please expand on the "read() returning zero" case? [I'm referring to the last paragraph.] Having subscribed EPOLLRDHUP, this flag will always be set in epoll_wait() when the peer closes the connection. Well, that's exactly its purpose! If EPOLLRDHUP weren't set in case of a closure, that's clearly a kernel bug. Thus, it's safe to say that either EPOLLRDHUP or EPOLLHUP or EPOLLERR is set when read() would return 0. In my tests I couldn't simulate your presumption and frankly speaking, it's the first time I've ever heard of it. Do you have any additional information or at leastChirr
In theory, you are right. In theory, when read would return "zero bytes read", then EPOLLRDHUP must already be set. In practice, I've seen strange things happen in edge-triggered mode (level-triggered is 100% reliable in every way from what I can tell) which are not entirely conform from what the docs say either. Therefore, to be on the 100% safe side, in edge-triggered mode, I would check whether read returns "zero read", just to be sure. It doesn't really cost anything, and it puts you in a 100% failsafe position.Dished
To give an example of "strange behaviour" which I think is not correct, several epolls waiting on the same descriptor should all be notified and several threads blocking on the same epoll should all wake up if something happens on the descriptor. That's what the Q+A in the sec. 7 manpage says. I thought this would be a cool way to timely synchronize several threads on the same machine. Except, in ET, it will wake up one thread and that's it. The behaviour would be ok if a thread entered wait after the event happened (because it's already consumed then), but it happens in every case.Dished
Thanks. I think I'll add assert(ret != 0) because this really sounds like a bug to me. I'm wondering, though, why level-triggered is more "reliable" than edge-triggered.Chirr
The same thing works perfectly well in LT mode, except you must read the eventfd before you can pulse it again, which is an annoyance... you don't know from which thread to do it, so you must do it from all threads, and since that would block all but one, you must set the descriptor to nonblocking, and meh... that's why I tried to implement this in ET mode in the first place ---- long story short, ET might not do 100% what you think in all cases, so it does not hurt to do some extra checks.Dished
Interesting idea. Well, the manual also states that this is "not recommended" (whatever that's supposed to mean!) but it also says that "it would be reported to both." (Q2/A2). So, this is clearly a bug (unless you have done a mistake somewhere).Chirr
As for your problem, I'd probably have put all the epoll waiting code in one thread and deferred the respective processing tasks into extra threads. That way, you could still use non-blocking sockets in edge-triggered mode without the issues you've mentioned.Chirr
FYI, EPOLLRDHUP might not be set together with the EPOLLIN that corresponds to a zero-sized read, even for level-triggered epoll. I see this often on my linux. However, if you continue to poll, you will get it (together with an EPOLLIN, and a zero-sized read). I also know that the reverse may happen, i.e., to get a EPOLLRDHUP together with EPOLLIN with non-zero read, if the peer sent something just before closing the connection.Danit

© 2022 - 2024 — McMap. All rights reserved.