How to iterate through a fd_set

C

8

17

I'm wondering if there's an easy way to iterate through a fd_set? The reason I want to do this is to not having to loop through all connected sockets, since select() alters these fd_sets to only include the ones I'm interested about. I also know that using an implementation of a type that is not meant to be directly accessed is generally a bad idea since it may vary across different systems. However, I need some way to do this, and I'm running out of ideas. So, my question is:

How do I iterate through an fd_set? If this is a really bad practice, are there any other ways to solve my "problem" except from looping through all connected sockets?

Chance answered 7/9, 2010 at 17:56 Comment(16)

To emphasize what I mean. I do not want to use the FD_ISSET approach since it requires me to loop through all connected sockets. But since, by definition, select() removes non-relevant file descriptors from the set, I want to loop through the set. – Chance 7/9, 2010 at 18:26

It doesn't necessarily mean "all connected". You can pass a subset of your connected sockets to select and then use FD_ISSET on only that subset after select returns. Also, is there an actual problem with looping over all of them? Unless you're dealing with many thousands of connected sockets, the loop will likely take an inconsequential amount of time. – Concertino 7/9, 2010 at 18:34

Agree with Rakis. This is one of those things that seems to be inefficient but in most cases really isn't The time to go through the loop will be dwarfed by the time it takes to service just one of the set FDs. – Decolorize 7/9, 2010 at 18:50

There are going to be a substantial amount of connected sockets, therefore it just seems bad to loop through all of them. I can't find it atm but I'm pretty sure I've read select() will remove from the fd_set I pass in the sockets which are not ready to read from when returning. That is the functionality I'm looking for. (Perhaps I'm miss remembering, since I can't find it atm.) – Chance 7/9, 2010 at 18:52

Sure but let's say you have 10,000 connections and 5000 have been set after select(). What's the cost of 5k (unset) bit tests versus what it going to take to read/write/whatever the 5000 that were set? – Decolorize 7/9, 2010 at 19:0

Yes, I do see your point. BUt let's say we have 10 000 open connetions, and select() returns after only 20 or so needs to be processed. Then we have an overlay of alot of connections we need to loop through. Surely there must be a more efficient way? – Chance 7/9, 2010 at 19:3

Please see my answer once.According to me, any alternative method will be even worse with respect to efficiency. – Ebner 7/9, 2010 at 19:8

@Andreas: 10000 open connections and all transactions are handled serially? – Maisey 7/9, 2010 at 19:8

iterate.. equals looping, right? So you're asking "How do I loop, without looping?". What is the goal here? I do not understand this, because -- You don't want to use FD_ISSET .. but since select removes .. file descriptors .. you want to loop through the set? -- Also, I agree with Rakis and Duck here. – Baguette 7/9, 2010 at 19:12

@Potatoswatter: The current model works like this: All connected sockets are waited serially by the same select(), and then their requests rather than their connections get passed onto different threads. Perhaps not the best model, nonetheless I still feel like what I want to do should be doable? – Chance 7/9, 2010 at 19:15

Michael: Indeed iterating and looping refers to the same thing. I'm not asking how to loop without looping. I'm asking how I can loop through the fd_set that is returned from select as an out parameter, rather than looping through my std::set of connected sockets and calling FD_ISSET() to see if they are indeed in there. – Chance 7/9, 2010 at 19:17

@Andreas: What you have here is called a bottleneck. There are other things that can go wrong besides FD_ISSET being to slow, to say the least. Just divide connections among several dispatcher threads. – Maisey 7/9, 2010 at 19:21

@Andreas: you really shouldn't be using select() - but either poll() or a wrapper library like libevent. Otherwise, man ffs - fd_set is a plain bitmap on all systems (except Windows). – Paola 7/9, 2010 at 22:15

@Andreas: aah! Now I understand :) Although, I still think you are looking at the wrong end.. I believe that you can run your program through a profiler too find bottlenecks. I do not think this loop will show up as one of the big ones (but do correct me if I'm wrong :) : #68054 – Baguette 8/9, 2010 at 9:44

#2589596 might be useful, to skip useless loops. Uses __builtin_clz which might have a corresponding ASM-Ccall on your machine. – Footpace 22/4, 2013 at 10:35

TL;DR If you need to do this, don't use select(). If you are forced to use select(), write benchmark first to estimate how much real time is spent iterating the bits in an optimized build in a realistic use case, before trying to optimise it. – Dividivi 22/4 at 23:52

E

7

Select sets the bit corresponding to the file descriptor in the set, so, you need-not iterate through all the fds if you are interested in only a few (and can ignore others) just test only those file-descriptors for which you are interested.

if (select(fdmax+1, &read_fds, NULL, NULL, NULL) == -1) {
   perror("select");
   exit(4);
}

if(FD_ISSET(fd0, &read_fds))
{
   //do things
}

if(FD_ISSET(fd1, &read_fds))
{
   //do more things
}

EDIT
Here is the fd_set struct:

typedef struct fd_set {
        u_int   fd_count;               /* how many are SET? */
        SOCKET  fd_array[FD_SETSIZE];   /* an array of SOCKETs */
} fd_set;

Where, fd_count is the number of sockets set (so, you can add an optimization using this) and fd_array is a bit-vector (of the size FD_SETSIZE * sizeof(int) which is machine dependent). In my machine, it is 64 * 64 = 4096.

So, your question is essentially: what is the most efficient way to find the bit positions of 1s in a bit-vector (of size around 4096 bits)?

I want to clear one thing here:
"looping through all the connected sockets" doesn't mean that you are actually reading/doing stuff to a connection. FD_ISSET() only checks weather the bit in the fd_set positioned at the connection's assigned file_descriptor number is set or not. If efficiency is your aim, then isn't this the most efficient? using heuristics?

Please tell us what's wrong with this method, and what are you trying to achieve using the alternate method.

Ebner answered 7/9, 2010 at 18:18 Comment(6)

Thanks to you too. But please see my comment, perhaps I was not explaining explicitly enough this is the approach I do not wish to take. – Chance 7/9, 2010 at 18:31

If this is not the answer that [is correct/you want], why has it been marked as the answer? – Leenaleeper 7/9, 2010 at 20:11

For two reasons. a) the edit provided the information I was looking for b) I changed my mind and therefore the answer became relevant. – Chance 14/9, 2010 at 14:34

The definition of fd_set itself is dependent on the operating system. Linux's fd_set does not have an fd_count member. – Holozoic 31/7, 2011 at 19:14

If performance > portability, the x86_64 instruction set has some instructions for REALLY REALLY fast scanning for bits in a machine word. So just construct a simple assembly function to do the scanning. – Stansbury 19/8, 2014 at 8:51

FD_SETSIZE * sizeof(int) doesn't make any sense. As far as I know, FD_SETSIZE is the number of file descriptors in the bit map and there's no reason to multiply it by magic constants. Also the pseudo-C defining the bit array IMO only creates more confusion. – Extraterritorial 9/2, 2016 at 8:52

B

12

You have to fill in an fd_set struct before calling select(), you cannot pass in your original std::set of sockets directly. select() then modifies the fd_set accordingly, removing any sockets that are not "set", and returns how many sockets are remaining. You have to loop through the resulting fd_set, not your std::set. There is no need to call FD_ISSET() because the resulting fd_set only contains "set" sockets that are ready, eg:

fd_set read_fds;
FD_ZERO(&read_fds);

int max_fd = 0;

read_fds.fd_count = connected_sockets.size();
for( int i = 0; i < read_fds.fd_count; ++i ) 
{
    read_fds.fd_array[i] = connected_sockets[i];
    if (read_fds.fd_array[i] > max_fd)
      max_fd = read_fds.fd_array[i];
}

if (select(max_fd+1, &read_fds, NULL, NULL, NULL) > 0)
{ 
    for( int i = 0; i < read_fds.fd_count; ++i ) 
        do_socket_operation( read_fds.fd_array[i] ); 
}

Where FD_ISSET() comes into play more often is when using error checking with select(), eg:

fd_set read_fds;
FD_ZERO(&read_fds);

fd_set error_fds;
FD_ZERO(&error_fds);

int max_fd = 0;

read_fds.fd_count = connected_sockets.size();
for( int i = 0; i < read_fds.fd_count; ++i ) 
{
    read_fds.fd_array[i] = connected_sockets[i];
    if (read_fds.fd_array[i] > max_fd)
      max_fd = read_fds.fd_array[i];
}

error_fds.fd_count = read_fds.fd_count;
for( int i = 0; i < read_fds.fd_count; ++i ) 
{
    error_fds.fd_array[i] = read_fds.fd_array[i];
}

if (select(max_fd+1, &read_fds, NULL, &error_fds, NULL) > 0)
{ 
    for( int i = 0; i < read_fds.fd_count; ++i ) 
    {
        if( !FD_ISSET(read_fds.fd_array[i], &error_fds) )
            do_socket_operation( read_fds.fd_array[i] ); 
    }

    for( int i = 0; i < error_fds.fd_count; ++i ) 
    {
        do_socket_error( error_fds.fd_array[i] ); 
    }
}

Blest answered 7/9, 2010 at 19:36 Comment(5)

The select manpage says: nfds is the highest-numbered file descriptor in any of the three sets, plus 1. Use the highest-numbered, not the count! – Footpace 22/4, 2013 at 12:37

@RemyLebeau, if you set error_fds.fd_count = read_fds.fd_count; then wouldn't it be better if you use only 1 for loop in the if(select...) statement, and I guess there are some errors with checking FD_ISSET for read_fds and error_fds. (i.e. FD_ISSET(read_fds) is not checked at all) – Twirl 1/12, 2014 at 16:27

Checking FD_ISSET(read_fds.fd_array[i], &read_fds) would be redundant in this example because the loop is already iterating through the items of read_fds that were "set". The loop is checking if each "readible" item is not also in error_fds before processing it. To use a single loop for both fd_set at the same time would require different loop logic: for (int fd = 0; fd <= max_fd; ++fd) { if (FD_ISSET(fd, &error_fds) {...} else if (FD_ISSET(fd, &read_fds)) {...} }, but that is not portable to all platforms (Windows in particular). – Blest 1/12, 2014 at 17:5

Of course, looping through fd_count directly isn't really portable, either. fd_set is meant to be opaque, you really should not be accessing its elements directly. The FD_...() macros are meant to hide away those details. – Blest 1/12, 2014 at 17:48

Instead of using for (int fd = 0; fd <= max_fd; ++fd), I would probably loop through connected_sockets, passing each one to FD_ISSET(). – Blest 1/12, 2014 at 17:50

E

7

Select sets the bit corresponding to the file descriptor in the set, so, you need-not iterate through all the fds if you are interested in only a few (and can ignore others) just test only those file-descriptors for which you are interested.

if (select(fdmax+1, &read_fds, NULL, NULL, NULL) == -1) {
   perror("select");
   exit(4);
}

if(FD_ISSET(fd0, &read_fds))
{
   //do things
}

if(FD_ISSET(fd1, &read_fds))
{
   //do more things
}

EDIT
Here is the fd_set struct:

typedef struct fd_set {
        u_int   fd_count;               /* how many are SET? */
        SOCKET  fd_array[FD_SETSIZE];   /* an array of SOCKETs */
} fd_set;

Where, fd_count is the number of sockets set (so, you can add an optimization using this) and fd_array is a bit-vector (of the size FD_SETSIZE * sizeof(int) which is machine dependent). In my machine, it is 64 * 64 = 4096.

So, your question is essentially: what is the most efficient way to find the bit positions of 1s in a bit-vector (of size around 4096 bits)?

I want to clear one thing here:
"looping through all the connected sockets" doesn't mean that you are actually reading/doing stuff to a connection. FD_ISSET() only checks weather the bit in the fd_set positioned at the connection's assigned file_descriptor number is set or not. If efficiency is your aim, then isn't this the most efficient? using heuristics?

Please tell us what's wrong with this method, and what are you trying to achieve using the alternate method.

Ebner answered 7/9, 2010 at 18:18 Comment(6)

Thanks to you too. But please see my comment, perhaps I was not explaining explicitly enough this is the approach I do not wish to take. – Chance 7/9, 2010 at 18:31

If this is not the answer that [is correct/you want], why has it been marked as the answer? – Leenaleeper 7/9, 2010 at 20:11

For two reasons. a) the edit provided the information I was looking for b) I changed my mind and therefore the answer became relevant. – Chance 14/9, 2010 at 14:34

The definition of fd_set itself is dependent on the operating system. Linux's fd_set does not have an fd_count member. – Holozoic 31/7, 2011 at 19:14

If performance > portability, the x86_64 instruction set has some instructions for REALLY REALLY fast scanning for bits in a machine word. So just construct a simple assembly function to do the scanning. – Stansbury 19/8, 2014 at 8:51

FD_SETSIZE * sizeof(int) doesn't make any sense. As far as I know, FD_SETSIZE is the number of file descriptors in the bit map and there's no reason to multiply it by magic constants. Also the pseudo-C defining the bit array IMO only creates more confusion. – Extraterritorial 9/2, 2016 at 8:52

C

4

It's fairly straight-forward:

for( int fd = 0; fd < max_fd; fd++ )
    if ( FD_ISSET(fd, &my_fd_set) )
        do_socket_operation( fd );

Concertino answered 7/9, 2010 at 18:17 Comment(1)

Thanks for the answer. Please see my comment for clarification of what I want to do. – Chance 7/9, 2010 at 18:31

W

4

This looping is a limitation of the select() interface. The underlying implementations of fd_set are usually a bit set, which obviously means that looking for a socket requires scanning over the bits.

It is for precisely this reason that several alternative interfaces have been created - unfortunately, they are all OS-specific. For example, Linux provides epoll, which returns a list of only the file descriptors that are active. FreeBSD and Mac OS X both provide kqueue, which accomplishes the same result.

Waylan answered 8/9, 2010 at 2:36 Comment(0)

V

1

See this section 7.2 of Beej's guide to networking - '7.2. select()—Synchronous I/O Multiplexing' by using FD_ISSET.

in short, you must iterate through an fd_set in order to determine whether the file descriptor is ready for reading/writing...

Vervain answered 7/9, 2010 at 18:14 Comment(1)

Thanks for the answer. I know this is the standard approach, however I'm looking to sway from it, please see my comment on my own post. – Chance 7/9, 2010 at 18:28

Z

1

I don't think you could do much using the select() call efficiently. The information at "The C10K problem" are still valid.

You will need some platform specific solutions:

Linux => epoll
FreeBSD => kqueue

Or you could use an event library to hide the platform detail for you libev

Zwieback answered 15/3, 2013 at 20:58 Comment(0)

V

0

I don't think what you are trying to do is a good idea.

Firstly its system dependent, but I believe you already know it.

Secondly, at the internal level these sets are stored as an array of integers and fds are stored as set bits. Now according to the man pages of select the FD_SETSIZE is 1024. Even if you wanted to iterate over and get your interested fd's you have to loop over that number along with the mess of bit manipulation. So unless you are waiting for more than FD_SETSIZE fd's on select which I don't think so is possible, its not a good idea.

Oh wait!!. In any case its not a good idea.

Vocative answered 7/9, 2010 at 18:31 Comment(0)

P

0

ffs() may be used on POSIX or 4.3BSD for bits iteration, though it expects int (long and long long versions are glibc extensions). Of course, you have to check, if ffs() optimized as good as e.g. strlen and strchr.

Pulverize answered 16/3, 2022 at 8:25 Comment(0)

Recommended topics

Hot tags