Linux futex syscall spurious wakes with return value 0?
Asked Answered
V

1

7

I've run into an issue with the Linux futex syscall (FUTEX_WAIT operation) sometimes returning early seemingly without cause. The documentation specifies certain conditions that may cause it to return early (without a FUTEX_WAKE) but these all involve non-zero return values: EAGAIN if the value at the futex address does not match, ETIMEDOUT for timed waits that timeout, EINTR when interrupted by a (non-restarting) signal, etc. But I'm seeing a return value of 0. What, other than FUTEX_WAKE or the termination of a thread whose set_tid_address pointer points to the futex, could cause FUTEX_WAIT to return with a return value of 0?

In case it's useful, the particular futex I was waiting on is the thread tid address (set by the clone syscall with CLONE_CHILD_CLEARTID), and the thread had not terminated. My (apparently incorrect) assumption that the FUTEX_WAIT operation returning 0 could only happen when the thread terminated lead to serious errors in program logic, which I've since fixed by looping and retrying even if it returns 0, but now I'm curious as to why it happened.

Here is a minimal test case:

#define _GNU_SOURCE
#include <sched.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <linux/futex.h>
#include <signal.h>

static char stack[32768];
static int tid;

static int foo(void *p)
{
        syscall(SYS_getpid);
        syscall(SYS_getpid);
        syscall(SYS_exit, 0);
}

int main()
{
        int pid = getpid();
        for (;;) {
                int x = clone(foo, stack+sizeof stack,
                        CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND
                        |CLONE_THREAD|CLONE_SYSVSEM //|CLONE_SETTLS
                        |CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID
                        |CLONE_DETACHED,
                        0, &tid, 0, &tid);
                syscall(SYS_futex, &tid, FUTEX_WAIT, x, 0);
                /* Should fail... */
                syscall(SYS_tgkill, pid, tid, SIGKILL);
        }
}

Let it run for a while, at it should eventually terminate with Killed (SIGKILL), which is only possible if the thread still exists when the FUTEX_WAIT returns.

Before anyone goes assuming this is just the kernel waking the futex before it finishes destroying the thread (which might in fact be happening in my minimal test case here), please note that in my original code, I actually observed userspace code running in the thread well after FUTEX_WAIT returned.

Vergievergil answered 11/9, 2011 at 19:57 Comment(11)
I think we may need to see a minimal example; it's hard to come up with substantial advice, since so much is unknown (I'll post my one hunch as a temporary answer anyway, because it's to big for a comment)Indochina
Indeed, I'll see if I can put together a minimal example.Vergievergil
hm, I think the man page is quite unclear. the conditions under the return value of FUTEX_WAIT qualifies the non zero conditions as error conditions, not only diagnostics. Then later it says "In the event of an error, all operations return -1, and set errno to indicate the error." On the other hand the conditions here are not repeated in the ERRORS section.Shotton
And I just confirmed with strace that the "child thread" has not yet called _exit when FUTEX_WAIT returns.Vergievergil
It is probably worth asking this on the linux kernel mailing list.Xeniaxeno
If you do, please post the answer back here ... I'm curious to know as well ...Dialectics
@R.. Did you ever get any answers on this?Dialectics
The documentation states that EWOULDBLOCK is returned, not EAGAIN. On most systems these have the same numeric value, but not on SPARC.Subscription
@Jason: No, I didn't follow up much more..Vergievergil
Hi @R..GitHubSTOPHELPINGICE Is this issue still relevant?Kersey
@SomeName: I'm not sure.Vergievergil
B
1

Could you be dealing with a race condition between whether the parent or child operations complete first? You can probably investigate this theory by putting small sleeps at the beginning of your foo() or immediately after the clone() to determine if a forced sequencing of events masks the issue. I don't recommend fixing anything in that manner, but it could be helpful to investigate. Maybe the futex isn't ready to be waited upon until the child gets further through its initialization, but the parent's clone has enough to return to the caller?

Specifically, the CLONE_VFORK option's presence seems to imply this is a dangerous scenario. You may need a bi-directional signaling mechanism such that the child signals the parent that it has gotten far enough that it is safe to wait for the child.

Broomfield answered 14/9, 2011 at 20:13 Comment(1)
If tid had not already been written with the tid value at the time FUTEX_WAIT is called, the operation would return with EAGAIN rather than 0. (Anyway, the whole point of the CLONE_PARENT_SETTID flag to clone is to ensure that the value has been written before either thread is able to execute.) I don't see any possibility for a race here in userspace since nothing interesting is happening in userspace...Vergievergil

© 2022 - 2024 — McMap. All rights reserved.