I/O completion ports are awesome. There's no better word to describe them. If anything in Windows was done right, it's completion ports.
You can create some number of threads (does not really matter how many) and make them all block on one completion port until an event (either one you post manually, or an event from a timer or asynchronous I/O, or whatever) arrives. Then the completion port will wake one thread to handle the event, up to the limit that you specified. If you didn't specify anything, it will assume "up to number of CPU cores", which is really nice.
If there are already more threads active than the maximum limit, it will wait until one of them is done and then hand the event to the thread as soon as it goes to wait state. Also, it will always wake threads in a LIFO order, so chances are that caches are still warm.
In other words, completion ports are a no-fuss "poll for events" as well as "fill CPU as much as you can" solution.
You can throw file reads and writes at a completion port, sockets, or anything else that's waitable. And, you can post your own events if you want. Each custom event has at least one integer and one pointer worth of data (if you use the default structure), but you are not really limited to that as the system will happily accept any other structure too.
Also, completion ports are fast, really really fast. Once upon a time, I needed to notify one thread from another. As it happened, that thread already had a completion port for file I/O, but it didn't pump messages. So, I wondered if I should just bite the bullet and use the completion port for simplicity, even though posting a thread message would obviously be much more efficient. I was undecided, so I benchmarked. Surprise, it turned out completion ports were about 3 times faster. So... faster and more flexible, the decision was not hard.