Why is epoll faster than select?

Asked 28/6, 2013 at 0:54 Answered 21/4, 2014 at 13:29

I have seen a lot of comparisons which says select have to walk through the fd list, and this is slow. But why doesn't epoll have to do this?

Surfeit answered 28/6, 2013 at 0:54 Comment(0)

138

There's a lot of misinformation about this, but the real reason is this:

A typical server might be dealing with, say, 200 connections. It will service every connection that needs to have data written or read and then it will need to wait until there's more work to do. While it's waiting, it needs to be interrupted if data is received on any of those 200 connections.

With select, the kernel has to add the process to 200 wait lists, one for each connection. To do this, it needs a "thunk" to attach the process to the wait list. When the process finally does wake up, it needs to be removed from all 200 wait lists and all those thunks need to be freed.

By contrast, with epoll, the epoll socket itself has a wait list. The process needs to be put on only that one wait list using only one thunk. When the process wakes up, it needs to be removed from only one wait list and only one thunk needs to be freed.

To be clear, with epoll, the epoll socket itself has to be attached to each of those 200 connections. But this is done once, for each connection, when it is accepted in the first place. And this is torn down once, for each connection, when it is removed. By contrast, each call to select that blocks must add the process to every wait queue for every socket being monitored.

Ironically, with select, the largest cost comes from checking if sockets that have had no activity have had any activity. With epoll, there is no need to check sockets that have had no activity because if they did have activity, they would have informed the epoll socket when that activity happened. In a sense, select polls each socket each time you call select to see if there's any activity while epoll rigs it so that the socket activity itself notifies the process.

Pace answered 28/6, 2013 at 1:8 Comment(7)

+1, but this is an implementation detail. The OS could cache the thunk registrations and only update the thunks based on the diff from the registrations of the previous call. I have found the real killer is what you mentioned in the last paragraph. – Binetta 28/6, 2013 at 1:16

@Binetta That's true. You could make an implementation of select that does exactly this, calling epoll under the hood. You would still have the cost of caching the previous call's registrations, comparing them, and so on. (You have to be careful because the same file descriptor could refer to a different underlying socket. But the kernel could easily tell that.) – Pace 28/6, 2013 at 1:19

@jxh: except that determining the difference between two sets is O(n), so you don't actually save anything by caching the registrations in the OS level. You could cache it in the application level, which would probably have ways to detect insertion and deletion out of a set, then you would only need to tell the OS the differences. This is exactly how epoll is different from select/poll. – Paryavi 12/12, 2014 at 0:19

@LieRyan The difference would come from having something that stayed on the wait queues between calls to select and therefore not having to poll all the sockets in common to successive calls. – Pace 12/12, 2014 at 0:42

@DavidSchwartz is right. But it does not mean epoll is always faster than select/poll. With select and poll all fds are added in the user space, and the whole set is copied to kernel space and back. But with epoll the whole set is maintained in the kernel space, so there is a need to make a system call to add a new file descriptor to this list(epoll_ctl). System calls are expensive and in cases with many short-lived active connections epoll would be slower than select due to the system call overhead. refer to: kernel.org/doc/ols/2004/ols2004v1-pages-215-226.pdf – Skutchan 16/8, 2015 at 17:18

Thanks for this answer. It helped me decide that epoll was the way to go with my application. I'm making something that streams audio. I was using select to block on a non blocking fd, overwise I would have to loop with EAGAIN which hammers my CPU. Select was causing audio to stutter. Switching to epoll has fixed that. – Mabel 26/1, 2017 at 12:24

This is great articles which explain this in detail, for people who are still having some trouble understanding copyconstruct.medium.com/… – Frederique 18/3, 2021 at 8:56

The main difference between epoll and select is that in select() the list of file descriptors to wait on only exists for the duration of a single select() call, and the calling task only stays on the sockets' wait queues for the duration of a single call. In epoll, on the other hand, you create a single file descriptor that aggregates events from multiple other file descriptors you want to wait on, and so the list of monitored fd's is long-lasting, and tasks stay on socket wait queues across multiple system calls. Furthermore, since an epoll fd can be shared across multiple tasks, it is no longer a single task on the wait queue, but a structure that itself contains another wait queue, containing all processes currently waiting on the epoll fd. (In terms of implementation, this is abstracted over by the sockets' wait queues holding a function pointer and a void* data pointer to pass to that function).

So, to explain the mechanics a little more:

An epoll file descriptor has a private struct eventpoll that keeps track of which fd's are attached to this fd. struct eventpoll also has a wait queue that keeps track of all processes that are currently epoll_waiting on this fd. struct epoll also has a list of all file descriptors that are currently available for reading or writing.
When you add a file descriptor to an epoll fd using epoll_ctl(), epoll adds the struct eventpoll to that fd's wait queue. It also checks if the fd is currently ready for processing and adds it to the ready list, if so.
When you wait on an epoll fd using epoll_wait, the kernel first checks the ready list, and returns immediately if any file descriptors are already ready. If not, it adds itself to the single wait queue inside struct eventpoll, and goes to sleep.
When an event occurs on a socket that is being epoll()ed, it calls the epoll callback, which adds the file descriptor to the ready list, and also wakes up any waiters that are currently waiting on that struct eventpoll.

Obviously, a lot of careful locking is needed on struct eventpoll and the various lists and wait queues, but that's an implementation detail.

The important thing to note is that at no point above there did I describe a step that loops over all file descriptors of interest. By being entirely event-based and by using a long-lasting set of fd's and a ready list, epoll can avoid ever taking O(n) time for an operation, where n is the number of file descriptors being monitored.

Gunnysack answered 21/4, 2014 at 13:29 Comment(0)

Recommended topics

Hot tags