why is NON-BLOCKING sockets recommended in epoll [duplicate]
Asked Answered



I'm trying to learn how to use epoll() for TCP server application, because I'm expecting many connections.

I tried checking samples and tutorials, they always recommend using/setting sockets that are added in epoll() to be NON-BLOCKING sockets.


Univocal answered 9/10, 2014 at 2:41 Comment(5)
You can’t do multiple blocking reads at the same time on one thread.Sonni
Did you read the man page? man7.org/linux/man-pages/man7/epoll.7.htmlAndrien
With blocking I/O, all it takes is one misbehaving client to cause a denial of service to all clients. For example, if someone connects with a client that sends half of a command but never sends the second half (but keeps the TCP connection open indefinitely), and the server blocks inside recv() waiting for the second half of the command that never arrives, then the server is hung for an indefinite amount of time and no other clients will get their expected responses.Renvoi
@JeremyFriesner Unless the server uses threads or processes … And you haven’t answered the question.Masqat
@Masqat with N threads or processes, the problem is reduced but not eliminated; now it takes N misbehaving clients to cause a denial of service. And comments are not expected to answer the question; if I wanted to provide an answer I would have done so as an Answer.Renvoi

For level-triggered epoll, nonblocking sockets can help to minimize epoll_wait() calls, its an optimization issue.

For edge-triggered epoll, you MUST use nonblocking sockets AND call read() or write() until they return EWOULDBLOCK. If you don't, you can miss kernel notifications.

You can find a detailed answer here: https://eklitzke.org/blocking-io-nonblocking-io-and-epoll

Pongee answered 18/7, 2020 at 10:42 Comment(0)

It's a good question and not duplicated. Recently I also find a tutorial using nonblocking socket in select (select is level-triggered only), which causes me to think.

The question is:

Why using nonblocking IO or set fd to nonblicking, in level-triggered epoll, select or other similar interfaces?

There are in fact very solid reasons for this case.

Cite from the book The Linux Programming Interface :

63.1.2 Employing Nonblocking I/O with Alternative I/O Models

Nonblocking I/O (the O_NONBLOCK flag) is often used in conjunction with the I/O models described in this chapter. Some examples of why this can be useful are the following:

  • As explained in the previous section, nonblocking I/O is usually employed in conjunction with I/O models that provide edge-triggered notification of I/O events.
  • If multiple processes (or threads) are performing I/O on the same open file descriptions, then, from a particular process’s point of view, a descriptor’s readiness may change between the time the descriptor was notified as being ready and the time of the subsequent I/O call. Consequently, a blocking I/O call could block, thus preventing the process from monitoring other file descriptors. (This can occur for all of the I/O models that we describe in this chapter, regardless of whether they employ level-triggered or edge-triggered notification.)
  • Even after a level-triggered API such as select() or poll() informs us that a file descriptor for a stream socket is ready for writing, if we write a large enough block of data in a single write() or send(), then the call will nevertheless block.
  • In rare cases, level-triggered APIs such as select() and poll() can return spurious readiness notifications—they can falsely inform us that a file descriptor is ready. This could be caused by a kernel bug or be expected behavior in an uncommon scenario.

First, let's check case #2: "If multiple processes (or threads) are performing I/O on the same open file descriptions...".

Read this code from libevent introduction, http://www.wangafu.net/~nickm/libevent-book/01_intro.html .

/* For sockaddr_in */
#include <netinet/in.h>
/* For socket functions */
#include <sys/socket.h>
/* For fcntl */
#include <fcntl.h>
/* for select */
#include <sys/select.h>

#include <assert.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>

#define MAX_LINE 16384

rot13_char(char c)
    /* We don't want to use isalpha here; setting the locale would change
     * which characters are considered alphabetical. */
    if ((c >= 'a' && c <= 'm') || (c >= 'A' && c <= 'M'))
        return c + 13;
    else if ((c >= 'n' && c <= 'z') || (c >= 'N' && c <= 'Z'))
        return c - 13;
        return c;

struct fd_state {
    char buffer[MAX_LINE];
    size_t buffer_used;

    int writing;
    size_t n_written;
    size_t write_upto;

struct fd_state *
    struct fd_state *state = malloc(sizeof(struct fd_state));
    if (!state)
        return NULL;
    state->buffer_used = state->n_written = state->writing =
        state->write_upto = 0;
    return state;

free_fd_state(struct fd_state *state)

make_nonblocking(int fd)
    fcntl(fd, F_SETFL, O_NONBLOCK);

do_read(int fd, struct fd_state *state)
    char buf[1024];
    int i;
    ssize_t result;
    while (1) {
        result = recv(fd, buf, sizeof(buf), 0);
        if (result <= 0)

        for (i=0; i < result; ++i)  {
            if (state->buffer_used < sizeof(state->buffer))
                state->buffer[state->buffer_used++] = rot13_char(buf[i]);
            if (buf[i] == '\n') {
                state->writing = 1;
                state->write_upto = state->buffer_used;

    if (result == 0) {
        return 1;
    } else if (result < 0) {
        if (errno == EAGAIN)
            return 0;
        return -1;

    return 0;

do_write(int fd, struct fd_state *state)
    while (state->n_written < state->write_upto) {
        ssize_t result = send(fd, state->buffer + state->n_written,
                              state->write_upto - state->n_written, 0);
        if (result < 0) {
            if (errno == EAGAIN)
                return 0;
            return -1;
        assert(result != 0);

        state->n_written += result;

    if (state->n_written == state->buffer_used)
        state->n_written = state->write_upto = state->buffer_used = 0;

    state->writing = 0;

    return 0;

    int listener;
    struct fd_state *state[FD_SETSIZE];
    struct sockaddr_in sin;
    int i, maxfd;
    fd_set readset, writeset, exset;

    sin.sin_family = AF_INET;
    sin.sin_addr.s_addr = 0;
    sin.sin_port = htons(40713);

    for (i = 0; i < FD_SETSIZE; ++i)
        state[i] = NULL;

    listener = socket(AF_INET, SOCK_STREAM, 0);

#ifndef WIN32
        int one = 1;
        setsockopt(listener, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one));

    if (bind(listener, (struct sockaddr*)&sin, sizeof(sin)) < 0) {

    if (listen(listener, 16)<0) {


    while (1) {
        maxfd = listener;


        FD_SET(listener, &readset);

        for (i=0; i < FD_SETSIZE; ++i) {
            if (state[i]) {
                if (i > maxfd)
                    maxfd = i;
                FD_SET(i, &readset);
                if (state[i]->writing) {
                    FD_SET(i, &writeset);

        if (select(maxfd+1, &readset, &writeset, &exset, NULL) < 0) {

        if (FD_ISSET(listener, &readset)) {
            struct sockaddr_storage ss;
            socklen_t slen = sizeof(ss);
            int fd = accept(listener, (struct sockaddr*)&ss, &slen);
            if (fd < 0) {
            } else if (fd > FD_SETSIZE) {
            } else {
                state[fd] = alloc_fd_state();

        for (i=0; i < maxfd+1; ++i) {
            int r = 0;
            if (i == listener)

            if (FD_ISSET(i, &readset)) {
                r = do_read(i, state[i]);
            if (r == 0 && FD_ISSET(i, &writeset)) {
                r = do_write(i, state[i]);
            if (r) {
                state[i] = NULL;

main(int c, char **v)
    setvbuf(stdout, NULL, _IONBF, 0);

    return 0;

This is not an example of multiple processes (or threads) performing I/O on the same open file descriptions, but it demostrates the same idea.

In the do_read function, it uses recv in side a while(1) to read as many bytes as possible, but 1024 bytes for each recv. I guess this is a typical pattern.

So you need nonblocking here, otherwise recv will eventually block when there's no data in network input.

For #3, if you write too much data in a blocking socket and there's no enough buffer. send will block until all data are sent. And it could block for long enough time if there's no enough space in the send buffer. More details check https://mcmap.net/q/339479/-does-send-always-send-whole-buffer .

Coster answered 16/11, 2022 at 14:25 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.