It's a good question and not duplicated. Recently I also find a tutorial using nonblocking socket in select
(select
is level-triggered only), which causes me to think.
The question is:
Why using nonblocking IO or set fd
to nonblicking, in level-triggered epoll
, select
or other similar interfaces?
There are in fact very solid reasons for this case.
Cite from the book The Linux Programming Interface :
63.1.2 Employing Nonblocking I/O with Alternative I/O Models
Nonblocking I/O (the O_NONBLOCK
flag) is often used in conjunction
with the I/O models described in this chapter. Some examples of why
this can be useful are the following:
- As explained in the previous section, nonblocking I/O is usually employed in conjunction with I/O models that provide edge-triggered
notification of I/O events.
- If multiple processes (or threads) are performing I/O on the same open file descriptions, then, from a particular process’s point of
view, a descriptor’s readiness may change between the time the
descriptor was notified as being ready and the time of the subsequent
I/O call. Consequently, a blocking I/O call could block, thus
preventing the process from monitoring other file descriptors. (This
can occur for all of the I/O models that we describe in this chapter,
regardless of whether they employ level-triggered or edge-triggered
notification.)
- Even after a level-triggered API such as
select()
or poll()
informs us that a file descriptor for a stream socket is ready for writing, if
we write a large enough block of data in a single write()
or send()
,
then the call will nevertheless block.
- In rare cases, level-triggered APIs such as
select()
and poll()
can return spurious readiness notifications—they can falsely inform us
that a file descriptor is ready. This could be caused by a kernel bug
or be expected behavior in an uncommon scenario.
First, let's check case #2: "If multiple processes (or threads) are performing I/O on the same open file descriptions...".
Read this code from libevent introduction, http://www.wangafu.net/~nickm/libevent-book/01_intro.html .
/* For sockaddr_in */
#include <netinet/in.h>
/* For socket functions */
#include <sys/socket.h>
/* For fcntl */
#include <fcntl.h>
/* for select */
#include <sys/select.h>
#include <assert.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#define MAX_LINE 16384
char
rot13_char(char c)
{
/* We don't want to use isalpha here; setting the locale would change
* which characters are considered alphabetical. */
if ((c >= 'a' && c <= 'm') || (c >= 'A' && c <= 'M'))
return c + 13;
else if ((c >= 'n' && c <= 'z') || (c >= 'N' && c <= 'Z'))
return c - 13;
else
return c;
}
struct fd_state {
char buffer[MAX_LINE];
size_t buffer_used;
int writing;
size_t n_written;
size_t write_upto;
};
struct fd_state *
alloc_fd_state(void)
{
struct fd_state *state = malloc(sizeof(struct fd_state));
if (!state)
return NULL;
state->buffer_used = state->n_written = state->writing =
state->write_upto = 0;
return state;
}
void
free_fd_state(struct fd_state *state)
{
free(state);
}
void
make_nonblocking(int fd)
{
fcntl(fd, F_SETFL, O_NONBLOCK);
}
int
do_read(int fd, struct fd_state *state)
{
char buf[1024];
int i;
ssize_t result;
while (1) {
result = recv(fd, buf, sizeof(buf), 0);
if (result <= 0)
break;
for (i=0; i < result; ++i) {
if (state->buffer_used < sizeof(state->buffer))
state->buffer[state->buffer_used++] = rot13_char(buf[i]);
if (buf[i] == '\n') {
state->writing = 1;
state->write_upto = state->buffer_used;
}
}
}
if (result == 0) {
return 1;
} else if (result < 0) {
if (errno == EAGAIN)
return 0;
return -1;
}
return 0;
}
int
do_write(int fd, struct fd_state *state)
{
while (state->n_written < state->write_upto) {
ssize_t result = send(fd, state->buffer + state->n_written,
state->write_upto - state->n_written, 0);
if (result < 0) {
if (errno == EAGAIN)
return 0;
return -1;
}
assert(result != 0);
state->n_written += result;
}
if (state->n_written == state->buffer_used)
state->n_written = state->write_upto = state->buffer_used = 0;
state->writing = 0;
return 0;
}
void
run(void)
{
int listener;
struct fd_state *state[FD_SETSIZE];
struct sockaddr_in sin;
int i, maxfd;
fd_set readset, writeset, exset;
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = 0;
sin.sin_port = htons(40713);
for (i = 0; i < FD_SETSIZE; ++i)
state[i] = NULL;
listener = socket(AF_INET, SOCK_STREAM, 0);
make_nonblocking(listener);
#ifndef WIN32
{
int one = 1;
setsockopt(listener, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one));
}
#endif
if (bind(listener, (struct sockaddr*)&sin, sizeof(sin)) < 0) {
perror("bind");
return;
}
if (listen(listener, 16)<0) {
perror("listen");
return;
}
FD_ZERO(&readset);
FD_ZERO(&writeset);
FD_ZERO(&exset);
while (1) {
maxfd = listener;
FD_ZERO(&readset);
FD_ZERO(&writeset);
FD_ZERO(&exset);
FD_SET(listener, &readset);
for (i=0; i < FD_SETSIZE; ++i) {
if (state[i]) {
if (i > maxfd)
maxfd = i;
FD_SET(i, &readset);
if (state[i]->writing) {
FD_SET(i, &writeset);
}
}
}
if (select(maxfd+1, &readset, &writeset, &exset, NULL) < 0) {
perror("select");
return;
}
if (FD_ISSET(listener, &readset)) {
struct sockaddr_storage ss;
socklen_t slen = sizeof(ss);
int fd = accept(listener, (struct sockaddr*)&ss, &slen);
if (fd < 0) {
perror("accept");
} else if (fd > FD_SETSIZE) {
close(fd);
} else {
make_nonblocking(fd);
state[fd] = alloc_fd_state();
assert(state[fd]);/*XXX*/
}
}
for (i=0; i < maxfd+1; ++i) {
int r = 0;
if (i == listener)
continue;
if (FD_ISSET(i, &readset)) {
r = do_read(i, state[i]);
}
if (r == 0 && FD_ISSET(i, &writeset)) {
r = do_write(i, state[i]);
}
if (r) {
free_fd_state(state[i]);
state[i] = NULL;
close(i);
}
}
}
}
int
main(int c, char **v)
{
setvbuf(stdout, NULL, _IONBF, 0);
run();
return 0;
}
This is not an example of multiple processes (or threads) performing I/O on the same open file descriptions, but it demostrates the same idea.
In the do_read
function, it uses recv
in side a while(1)
to read as many bytes as possible, but 1024
bytes for each recv
. I guess this is a typical pattern.
So you need nonblocking here, otherwise recv
will eventually block when there's no data in network input.
For #3, if you write too much data in a blocking socket and there's no enough buffer. send
will block until all data are sent. And it could block for long enough time if there's no enough space in the send buffer. More details check https://mcmap.net/q/339479/-does-send-always-send-whole-buffer .
recv()
waiting for the second half of the command that never arrives, then the server is hung for an indefinite amount of time and no other clients will get their expected responses. – Renvoi