I assign the Linux FUTEX(2)
man page as required reading in operating systems classes, as a warning to students not to get complacent when designing synchronization primitives.
The futex()
system call is the API that Linux provides to allow user-level thread synchronization primitives to sleep and wake up when necessary. The man page describes the 5 different operations that can be invoked using the futex()
system call. The two fundamental operations are FUTEX_WAIT
(which a thread uses to put itself to sleep when it tries to acquire a synchronization object and someone is already holding it), and FUTEX_WAKE
(which a thread uses to wake up any waiting threads when it releases a synchronization object.)
The next three operations are where the fun starts. The man page description goes like this:
FUTEX_FD (present up to and including Linux 2.6.25)
[...]
Because it was inherently racy, FUTEX_FD has been removed
from Linux 2.6.26 onward.
The paper "Futexes are Tricky" by Ulrich Dreper, 2004 describes that race condition (it's a potential missed wakeup). But there's more:
FUTEX_REQUEUE (since Linux 2.5.70)
This operation was introduced in order to avoid a
"thundering herd" effect when FUTEX_WAKE is used and all
processes woken up need to acquire another futex. [...]
FUTEX_CMP_REQUEUE (since Linux 2.6.7)
There was a race in the intended use of FUTEX_REQUEUE, so
FUTEX_CMP_REQUEUE was introduced. [...]
What was the race in FUTEX_REQUEUE
? Ulrich's paper doesn't even mention it (the paper describes a function futex_requeue()
that is implemented using FUTEX_CMP_REQUEUE
, but not the FUTEX_REQUEUE
operation).