Caveats of select/poll vs. epoll reactors in Twisted

Asked 9/1, 2010 at 6:39 Answered 30/4, 2014 at 7:33

Solved networking scalability twisted capacity-planning

101

Everything I've read and experienced ( Tornado based apps ) leads me to believe that ePoll is a natural replacement for Select and Poll based networking, especially with Twisted. Which makes me paranoid, its pretty rare for a better technique or methodology not to come with a price.

Reading a couple dozen comparisons between epoll and alternatives shows that epoll is clearly the champion for speed and scalability, specifically that it scales in a linear fashion which is fantastic. That said, what about processor and memory utilization, is epoll still the champ?

Metalepsis answered 9/1, 2010 at 6:39 Comment(0)

202

For very small numbers of sockets (varies depending on your hardware, of course, but we're talking about something on the order of 10 or fewer), select can beat epoll in memory usage and runtime speed. Of course, for such small numbers of sockets, both mechanisms are so fast that you don't really care about this difference in the vast majority of cases.

One clarification, though. Both select and epoll scale linearly. A big difference, though, is that the userspace-facing APIs have complexities that are based on different things. The cost of a select call goes roughly with the value of the highest numbered file descriptor you pass it. If you select on a single fd, 100, then that's roughly twice as expensive as selecting on a single fd, 50. Adding more fds below the highest isn't quite free, so it's a little more complicated than this in practice, but this is a good first approximation for most implementations.

The cost of epoll is closer to the number of file descriptors that actually have events on them. If you're monitoring 200 file descriptors, but only 100 of them have events on them, then you're (very roughly) only paying for those 100 active file descriptors. This is where epoll tends to offer one of its major advantages over select. If you have a thousand clients that are mostly idle, then when you use select you're still paying for all one thousand of them. However, with epoll, it's like you've only got a few - you're only paying for the ones that are active at any given time.

All this means that epoll will lead to less CPU usage for most workloads. As far as memory usage goes, it's a bit of a toss up. select does manage to represent all the necessary information in a highly compact way (one bit per file descriptor). And the FD_SETSIZE (typically 1024) limitation on how many file descriptors you can use with select means that you'll never spend more than 128 bytes for each of the three fd sets you can use with select (read, write, exception). Compared to those 384 bytes max, epoll is sort of a pig. Each file descriptor is represented by a multi-byte structure. However, in absolute terms, it's still not going to use much memory. You can represent a huge number of file descriptors in a few dozen kilobytes (roughly 20k per 1000 file descriptors, I think). And you can also throw in the fact that you have to spend all 384 of those bytes with select if you only want to monitor one file descriptor but its value happens to be 1024, wheras with epoll you'd only spend 20 bytes. Still, all these numbers are pretty small, so it doesn't make much difference.

And there's also that other benefit of epoll, which perhaps you're already aware of, that it is not limited to FD_SETSIZE file descriptors. You can use it to monitor as many file descriptors as you have. And if you only have one file descriptor, but its value is greater than FD_SETSIZE, epoll works with that too, but select does not.

Randomly, I've also recently discovered one slight drawback to epoll as compared to select or poll. While none of these three APIs supports normal files (i.e., files on a file system), select and poll present this lack of support as reporting such descriptors as always readable and always writable. This makes them unsuitable for any meaningful kind of non-blocking filesystem I/O, a program which uses select or poll and happens to encounter a file descriptor from the filesystem will at least continue to operate (or if it fails, it won't be because of select or poll), albeit it perhaps not with the best performance.

On the other hand, epoll will fail fast with an error (EPERM, apparently) when asked to monitor such a file descriptor. Strictly speaking, this is hardly incorrect. It's merely signalling its lack of support in an explicit way. Normally I would applaud explicit failure conditions, but this one is undocumented (as far as I can tell) and results in a completely broken application, rather than one which merely operates with potentially degraded performance.

In practice, the only place I've seen this come up is when interacting with stdio. A user might redirect stdin or stdout from/to a normal file. Whereas previously stdin and stdout would have been a pipe -- supported by epoll just fine -- it then becomes a normal file and epoll fails loudly, breaking the application.

Bidentate answered 9/1, 2010 at 15:11 Comment(6)

Very nice answer. Consider being explicit about the behavior of poll for completeness? – Turkey 15/6, 2010 at 21:39

My two cents on the behavior of reading from ordinary files: I generally prefer outright failure to performance degradation. The reason is that it's much more likely to be detected during development, and thus worked around properly (say by having an alternative method of doing the I/O for actual files). YMMV of course: there may not be noticeable slowdown in which case failure isn't better. But dramatic slowdown that happens only in special cases can be very hard to catch during development, leaving it as a time bomb when actually deployed. – Turkey 15/6, 2010 at 21:42

Just got to completely read your edit. In a sense I do agree that its probably not right for epoll not to mimic its predecessors but then again I can imagine the dev that implemented the EPERM error thought "Just because its always been broken, doesn't make it right to break mine as well." And yet another counter argument, I am a defensive programmer anything past 1+1 is suspect and I code in such a way to allow graceful failures. Having the kernel fire an out of expectation error isn't nice or considerate. – Metalepsis 7/11, 2010 at 5:37

@Jean-Paul could you add some explanation about kqueue as well? – Gorden 15/2, 2013 at 13:32

Putting aside performance, is there an issue resulting from this (from man select) The Linux kernel imposes no fixed limit, but the glibc implementation makes fd_set a fixed-size type, with FD_SETSIZE defined as 1024, and the FD_*() macros operating according to that limit. To monitor file descriptors greater than 1023, use poll(2) instead. On CentOS 7 I've already seen issues where my own code has failed a select() because the kernel returned a file handle >1023 and I'm currently looking at a problem that smells like it may be Twisted hitting the same issue. – Ralfston 6/11, 2018 at 13:25

Maaaybe? I'mt not sure I completely understand the question. It's true that select() fails on high-numbered file descriptors. That can be one reason to use one of the other mechanisms instead. If that's not sufficient answer, opening a new question might be a good idea. – Bidentate 6/11, 2018 at 13:52

In tests at my company, one issue with epoll() came up, thus a single cost compared to select.

When attempting to read from the network with a timeout, creating an epoll_fd ( instead of a FD_SET ), and adding the fd to the epoll_fd, is much more expensive than creating a FD_SET (which is a simple malloc).

As per the previous answer, as the number of FDs in the process becomes large, the cost of select() becomes higher, but in our testing, even with fd values in the 10,000's, select was still a winner. These are cases where there is only one fd that a thread is waiting on, and simply trying to overcome the fact that network read, and network write, doesn't timeout when using a blocking thread model. Of course, blocking thread models are low performance compared to non-blocking reactor systems, but there are occasions where, to integrate with a particular legacy code base, it is required.

This kind of use case is rare in high performance applications, because a reactor model doesn't need to create a new epoll_fd every time. For the model where an epoll_fd is long-lived --- which is clearly preferred for any high performance server design --- epoll is the clear winner in every way.

Emera answered 30/4, 2014 at 7:33 Comment(3)

But you can't even use select() if you have file descriptor values in the 10k+ range - unless you recompile half your system to change FD_SETSIZE - so I wonder how this strategy worked at all. For the scenario you described, I'd probably look at poll() which is much more like select() than it is like epoll() - but removes the FD_SETSIZE limitation. – Bidentate 30/4, 2014 at 12:7

You can use select() if you have file descriptor values in the 10K range, because you can malloc() an FD_SET. In fact, since FD_SETSIZE is compile time and the actual fd limit is at runtime, the ONLY safe use of FD_SET checks the number of the file descriptor against the size of the FD_SET, and does a malloc ( or moral equivilent ) if the FD_SET is too small. I was shocked when I saw this in production with a customer. After programming sockets for 20 years, all of the code I'd ever written - and most of the tutorials on the web - are unsafe. – Emera 7/5, 2014 at 19:1

This isn't true, as far as I know, on any popular platforms. FD_SETSIZE is a compile time constant set when your C library is compiled. If you define it to a different value when you build your application then your application and the C library will disagree and things will go poorly. If you have references claiming it is safe to redefine FD_SETSIZE I'd be interested to see them. – Bidentate 7/5, 2014 at 20:0

Recommended topics

Hot tags