What is the purpose of epoll's edge triggered option?
Asked Answered
C

3

74

From epoll's man page:

epoll is a variant of poll(2) that can be used either as an edge-triggered
or a level-triggered interface

When would one use the edge triggered option? The man page gives an example that uses it, but I don't see why it is necessary in the example.

Celie answered 6/2, 2012 at 15:42 Comment(0)
C
126

When an FD becomes read or write ready, you might not necessarily want to read (or write) all the data immediately.

Level-triggered epoll will keep nagging you as long as the FD remains ready, whereas edge-triggered won't bother you again until the next time you get an EAGAIN (so it's more complicated to code around, but can be more efficient depending on what you need to do).

Say you're writing from a resource to an FD. If you register your interest for that FD becoming write ready as level-triggered, you'll get constant notification that the FD is still ready for writing. If the resource isn't yet available, that's a waste of a wake-up, because you can't write any more anyway.

If you were to add it as edge-triggered instead, you'd get notification that the FD was write ready once, then when the other resource becomes ready you write as much as you can. Then if write(2) returns EAGAIN, you stop writing and wait for the next notification.

The same applies for reading, because you might not want to pull all the data into user-space before you're ready to do whatever you want to do with it (thus having to buffer it, etc etc). With edge-triggered epoll you get told when it's ready to read, and then can remember that and do the actual reading "as and when".

Crown answered 6/2, 2012 at 15:48 Comment(12)
Is this edge-triggered behavior safe against race conditions, e.g. if data becomes available after read fails with EAGAIN but before epoll is called?Mayle
Sure. epoll simply returns immediately if the FD is already ready and you haven't yet been notified.Crown
ET is also particularly nice with a multithreaded server on a multicore machine. You can run one thread per core and have all of them call epoll_wait on the same epfd. When data comes in on an fd, exactly one thread will be woken to handle it.Enoch
@ChrisDodd - Does this not work with level triggered epoll as well? Why not?Scenarist
@windfinder Correct me if I'm wrong but in LT mode multiple threads might be woken up on the same FD/SD in parallel, as long as data is there. With ET one only notification is set for FD/SD when data is there, so only one thread would get such notification; other threads might get notification for the same FD/SD but only when the original thread would have read/written all data for a notification; as you can imagine is a lot easier to write MT epoll processes with ET. Hope this helps.Sailesh
@Sailesh - Confirmed, ET guarantees that only one thread wakes up.Scenarist
@Sailesh You're probably WRONG. man 7 epoll: Since even with edge-triggered epoll, multiple events can be generated upon receipt of multiple chunks of data, the caller has the option to specify the EPOLLONESHOT flag, to tell epoll to disable the associated file descriptor after the receipt of an event with epoll_wait(2). When the EPOLLONESHOT flag is specified, it is the caller's responsibility to rearm the file descriptor using epoll_ctl(2) with EPOLL_CTL_MOD.Affectional
I've found that rearming the fd with EPOLLONESHOT in ET mode, will cause epoll_wait() to return immediately with EPOLLIN if the read buffer wasn't read completely since last EPOLLIN. Not sure if this is intended behavior, but it could be used to prevent starvation if so.Thursby
@JamesMcLaughlin Does the window IOCP have edge-trigger and level-trigger alike concepts ?Boothman
@Affectional "If multiple threads (or processes, if child processes have inherited the epoll file descriptor across fork(2)) are blocked in epoll_wait(2) waiting on the same the same epoll file descriptor and a file descriptor in the inter‐ est list that is marked for edge-triggered (EPOLLET) notification becomes ready, just one of the threads (or processes) is awoken from epoll_wait(2). This provides a useful optimization for avoiding "thundering herd" wake-ups in some scenarios."Rotz
@Affectional According to the manpage I referenced above, if you are using the same epoll instance in threads/child process, then only one should wake up.Rotz
@JiaHaoXu this paragraph is present in an online epoll man page, but is missing in Ubuntu 18.04.3 LTS man 7 epoll page. It seems to depend on the epoll (glib?) version.Dandle
B
11

In my experiments, ET doesn't guarantee that only one thread wakes up, although it often wakes up only one. The EPOLLONESHOT flag is for this purpose.

Brose answered 27/2, 2014 at 2:1 Comment(4)
man 7 epoll: Since even with edge-triggered epoll, multiple events can be generated upon receipt of multiple chunks of data, the caller has the option to specify the EPOLLONESHOT flag, to tell epoll to disable the associated file descriptor after the receipt of an event with epoll_wait(2). When the EPOLLONESHOT flag is specified, it is the caller's responsibility to rearm the file descriptor using epoll_ctl(2) with EPOLL_CTL_MOD.Affectional
Exactly, you get notified once per rising edge. If you add stdin to an epoll set as EPOLLET, each press of the enter key will generate an event. This is why EPOLLONESHOT is needed.Sidetrack
Did you have different epoll FDs or just one shared between threads? My understanding is that all epoll FDs should wake up but maybe only one thread for a shared FD. The new EPOLLEXCLUSIVE fixes the thundering herd problem for multiple epoll FDs.Grimy
I mean multiple threads were waiting for a single FD. Sometimes several threads waked up if the EPOLLONESHOT flag was not set. While only one woke up if the flag was set.Brose
P
3
  • Level triggered

    Use level trigger mode when you can't consume all the data in the FD and want epoll to keep triggering while data is available.

    For example, if you want to receive large files from FD, and you cannot consume all the file data from the FD at one time, and want to keep the triggering continue for the next consumption. The level trigger mode could be suitable for this case.

    • Disadvantage

      • thundering herd
        • The EPOLLEXCLUSIVE directive is meant to prevent the thundering heard phenomenon
      • less efficiency
        • When a read/write event occurs on the monitored file descriptor, epoll_wait() notifies the handler to read or write. If you don’t read or write all the data at once (e.g., the read/write buffer is too small), then the next time epoll_wait() is called, it will notify you to continue reading or writing on the file descriptor you didn’t finish reading or writing on, but of course, if you never read or write, it will keep notifying you.
        • If the system has a large number of ready file descriptors that you don’t need to read or write, and they return every time, this can greatly reduce the efficiency of the handler retrieving the ready file descriptors it cares about.
    • use cases

      • redis epoll Since the IO thread of Redis is single-threaded, level trigger mode is used.
  • Edge triggered

    Use edge triggered mode and make sure all data available is buffered and will be handled eventually.

    As Chris Dodd mentioned in the comments

    ET is also particularly nice with a multithreaded server on a multicore machine. You can run one thread per core and have all of them call epoll_wait on the same FD. When data comes in on an FD, exactly one thread will be woken to handle it

Philosophy answered 30/8, 2022 at 9:50 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.