What was the `FUTEX_REQUEUE` bug?

Asked 27/8, 2014 at 2:52 Answered 24/1, 2021 at 3:52

I assign the Linux FUTEX(2) man page as required reading in operating systems classes, as a warning to students not to get complacent when designing synchronization primitives.

The futex() system call is the API that Linux provides to allow user-level thread synchronization primitives to sleep and wake up when necessary. The man page describes the 5 different operations that can be invoked using the futex() system call. The two fundamental operations are FUTEX_WAIT (which a thread uses to put itself to sleep when it tries to acquire a synchronization object and someone is already holding it), and FUTEX_WAKE (which a thread uses to wake up any waiting threads when it releases a synchronization object.)

The next three operations are where the fun starts. The man page description goes like this:

FUTEX_FD (present up to and including Linux 2.6.25)
       [...]
       Because it was inherently racy, FUTEX_FD has been removed
       from Linux 2.6.26 onward.

The paper "Futexes are Tricky" by Ulrich Dreper, 2004 describes that race condition (it's a potential missed wakeup). But there's more:

FUTEX_REQUEUE (since Linux 2.5.70)
       This operation was introduced in order to avoid a
       "thundering herd" effect when FUTEX_WAKE is used and all
       processes woken up need to acquire another futex. [...]

FUTEX_CMP_REQUEUE (since Linux 2.6.7)
       There was a race in the intended use of FUTEX_REQUEUE, so
       FUTEX_CMP_REQUEUE was introduced. [...]

What was the race in FUTEX_REQUEUE? Ulrich's paper doesn't even mention it (the paper describes a function futex_requeue() that is implemented using FUTEX_CMP_REQUEUE, but not the FUTEX_REQUEUE operation).

Microseism answered 27/8, 2014 at 2:52 Comment(0)

It looks like the race condition is due to the implementation of mutex's in glibc and their disparity with futexes. FUTEX_CMP_REQUEUE seems to be needed to support the more complicated glibc mutexes:

They are much more complex because they support many more features, such as testing for deadlock, and recursive locking. Due to this, they have an internal lock protecting the extra state. This extra lock means that they cannot use the FUTEX_REQUEUE multiplex function due to a possible race.

Source: http://locklessinc.com/articles/futex_cheat_sheet/

Susette answered 28/8, 2014 at 14:16 Comment(2)

This is the start of an answer, but what is the possible race? That document doesn't say. – Bondy 28/8, 2014 at 14:27

Yup, it's part of an answer... as much as I was able to dig up this morning. I figured it was worth posting in case someone else wanted to jump in. – Susette 28/8, 2014 at 14:30

The old requeue operation takes two addresses addr1 and addr2, first it unpark waiters on addr1, then parks them back on addr2.

The new requeue operation does all that after it verifies *addr1 == user_provided_val.

To find out the possible race condition, consider the following two threads:

wait(cv, mutex);
  lock(&cv.lock);
  cv.mutex_ref = &mutex;
  unlock(&mutex);
  let futexval = ++cv.futex;
  unlock(&cv.lock);
  FUTEX_WAIT(&cv.futex, futexval);   // --- (1)

lock(&mutex);
broadcast(cv);
  lock(&cv.lock);
  let futexval = cv.futex;
  unlock(&cv.lock);
  FUTEX_CMP_REQUEUE(&cv.futex,   // --- (2)
    1 /*wake*/,
    ALL /*queue*/,
    &cv.mutex_ref.lock,
    futexval);

Both syscall (1) and (2) are executed without lock, but it is required that they are in the same total order as the mutex lock, so that a signal doesn't appear missing to the user.

Therefore, in order to detect a wait operation reordering after the actual wake, the futexval acquired in lock is passed to kernel at (2).

Similarly, we pass futexval to the FUTEX_WAIT call at (1). This design is explicitly stated in futex man page:

When executing a futex operation that requests to block a thread, the kernel will block only if the futex word has the value that the calling thread supplied (as one of the arguments of the futex() call) as the expected value of the futex word. The loading of the futex word's value, the comparison of that value with the expected value, and the actual blocking will happen atomically and will be totally ordered with respect to concurrent operations performed by other threads on the same futex word. Thus, the futex word is used to connect the synchronization in user space with the implementation of blocking by the kernel. Analogously to an atomic compare-and-exchange operation that potentially changes shared memory, blocking via a futex is an atomic compare-and-block operation.

IMHO, the reason for calling (2) outside of lock is mainly performance. To calling wake while holding lock will lead to "hurry up and wait" situation where the waiter wakes up and unable to acquire lock.

It's also worth mentioning that the above answer is based on a history version of pthread implementation. The latest version of pthread_cond has removed the usage of REQUEUE. (check this patch for details).

Grandiloquent answered 24/1, 2021 at 3:52 Comment(0)

Recommended topics

Hot tags