How are threads/processes parked and woken in Linux, prior to futex?

Asked 18/8, 2017 at 20:10 Answered 18/8, 2017 at 23:52

Before the futex system calls existed in Linux, what underlying system calls were used by threading libraries like pthreads to block/sleep a thread and to subsequently wake those threads from userland?

For example, if a thread tries to acquire a mutex, the userland implementation will block the thread (perhaps after a short spinning interval), but I can't find the syscalls that are used for this (other than futex which are a relatively recent creation).

Foozle answered 18/8, 2017 at 20:10 Comment(0)

Before futex and current implementation of pthreads for Linux, the NPTL (require kernel 2.6 and newer), there were two other threading libraries with POSIX Thread API for Linux: linuxthreads and NGPT (which was based on Gnu Pth. LinuxThreads was the only widely used libpthread for years (and it can still be used in some strange & unmaintained micro-libc to work on 2.4; other micro-libc variants may have own builtin implementation of pthread-like API on top of futex+clone). And Gnu Pth is not thread library, it is single process thread with user-level "thread" switching.

You should know that there are several Threading Models when we check does the kernel knows about some or all of user threads (how many CPU cores can be used with adding threads to the program; what is the cost of having the thread / how many threads may be started). Models are named as M:N where M is userspace thread number and N is thread number schedulable by OS kernel:

"1:1" ''kernel-level threading'' - every userspace thread is schedulable by OS kernel. This is implemented in Linuxthreads, NPTL and many modern OS.
"N:1" ''user-level threading'' - userspace threads are planned by the userspace, they all are invisible to the kernel, it only schedules one process (and it may use only 1 CPU core). Gnu Pth (GNU Portable Threads) is example of it, and there are many other implementations for some computer architectures.
"M:N" ''hybrid threading'' - there are some entities visible and schedulable by OS kernel, but there may be more user-space threads in them. And sometimes user-space threads will migrate between kernel-visible threads.

With 1:1 model there are many classic sleep mechanisms/APIs in Unix like select/poll and signals and other variants of IPC APIs. As I remember, Linuxthreads used separate processes for every thread (with fully shared memory) and there was special manager "thread" (process) to emulate some POSIX thread features. Wikipedia says that SIGUSR1/SIGUSR2 were used in Linuxthreads for some internal communication between threads, same says IBM "The synchronization of primitives is achieved by means of signals. For example, threads block until awoken by signals.". Check also the project FAQ http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html#H.4 "With LinuxThreads, I can no longer use the signals SIGUSR1 and SIGUSR2 in my programs! Why?"

LinuxThreads needs two signals for its internal operation. One is used to suspend and restart threads blocked on mutex, condition or semaphore operations. The other is used for thread cancellation. On ``old'' kernels (2.0 and early 2.1 kernels), there are only 32 signals available and the kernel reserves all of them but two: SIGUSR1 and SIGUSR2. So, LinuxThreads has no choice but use those two signals.

With "N:1" model thread may call some blocking syscall and block everything (some libraries may convert some blocking syscalls into async, or use some SIGALRM or SIGVTALRM magic); or it may call some (very) special internal threading function which will do user-space thread switching by rewriting machine state register (like switch_to in linux kernel, save IP/SP and other regs, restore IP/SP and regs of other thread). So, kernel does not wake any user thread directly from userland, it just schedules whole process; and user space scheduler implement thread synchronization logic (or just calls sched_yield or select when there is no threads to work).

With M:N model things are very complicated... Don't know much about NGPT... There is one paragraph about NGPT in POSIX Threads and the Linux Kernel, Dave McCracken, OLS2002,330 page 5

There is a new pthread library under development called NGPT. This library is based on the GNU Pth library, which is an M:1 library. NGPT extends Pth by using multiple Linux tasks, thus creating an M:N library. It attempts to preserve Pth’s pthread compatibility while also using multiple Linux tasks for concurrency, but this effort is hampered by the underlying differences in the Linux threading model. The NGPT library at present uses non-blocking wrappers around blocking system calls to avoid blocking in the kernel.

Some papers and posts: POSIX Threads and the Linux Kernel, Dave McCracken, OLS2002,330, LWN post about NPTL 0.1

The futex system call is used extensively in all synchronization primitives and other places which need some kind of synchronization. The futex mechanism is generic enough to support the standard POSIX synchronization mechanisms with very little effort. ... Futexes also allow the implementation of inter-process synchronization primitives, a sorely missed feature in the old LinuxThreads implementation (Hi jbj!).

NPTL design pdf:

5.5 Synchronization Primitives The implementation of the synchronization primitives such as mutexes, read-write locks, conditional variables, semaphores, and barriers requires some form of kernel support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application. Fortunately some new functionality was added to the kernel to implement all kinds of synchronization primitives: futexes [Futex]. The underlying principle is simple but powerful enough to be adaptable to all kinds of uses. Callers can block in the kernel and be woken either explicitly, as a result of an interrupt, or after a timeout.

Wesleywesleyan answered 18/8, 2017 at 23:52 Comment(3)

I could hardly ask for a better answer. – Foozle 19/8, 2017 at 0:11

Hello, please, check this paper "Computer Performance Microscopy with SHIM" by Xi Yang 2015 (github.com/ShimProfiler/SHIM microsoft.com/en-us/research/publication/…), it may give you some ideas for your next question 45556708 (idea: some PMU counters are private for hyperthreads, and other are shared for two hyperthreads. By adding second hyperthread we can profile first one: "a continuous profiler that samples at resolutions as fine as 15 cycles; ... at a 61% overhead") – Wesleywesleyan 19/8, 2017 at 0:51

I've read the paper. They take the opposite approach to the one I'm contemplating - they are essentially sampling the main thread from the helper, while I want the main thread to record "exact" traces with the help of a counter incremented by the helper. The problem seems to be memory speculation failures with tight sharing of counters in that way across hyperthreads. – Foozle 19/8, 2017 at 0:55

-3

Futex stands for "fast userspace mutex." It's simply an abstraction over mutexes which is considered faster and more convenient than traditional mutex mechanisms because it implements the wait system for you. Before and after futex(), threads were put to sleep and awoken via a change in their process state. The process states are:

Running state
Sleeping state
Un-interruptible sleeping state (i.e. blocking for a syscall like read() or write()
Defunct/zombie state

When a thread is suspended, it is put into (interruptible) 'sleep' state. Later, it can be woken via the wake_up() function, which operates on its task structure within the kernel. As far as I can tell, wake_up is a kernel function, not a syscall. The kernel doesn't need a syscall to wake or sleep a task; it (or a process) simply changes the task structure to reflect the state of the process. When the Linux scheduler next deals with that process, it treats it according to its state (again, the states are listed above).

Short story: futex() implements a wait system for you. Without it, you need a data structure that's accessible from the main thread and from the sleeping thread in order to wake up a sleeping thread. All of this is done with userland code. The only thing you might need from the kernel is a mutex--the specifics of which do include locking mechanisms and mutex datastructures, but don't inherently wake or sleep the thread. The syscalls you're looking for don't exist. Essentially, most of what you're talking about can be achieved from userspace, without a syscall, by manually keeping track of data conditions that determine whether and when to sleep or wake a thread.

Recap answered 18/8, 2017 at 20:18 Comment(14)

And as for the pthread_* functions, they're also largely implemented without relying very many system calls. Also see this answer. – Recap 18/8, 2017 at 20:19

They must rely on system calls for blocking and waking of threads. Userspace cannot do that without the help of the kernel and scheduler. I'm asking how this happened before futex() was introduced. – Foozle 18/8, 2017 at 20:21

Oh, and this one – Recap 18/8, 2017 at 20:24

The kernel has had mutex's of some kind for a long time. And most of this is done in userspace. – Recap 18/8, 2017 at 20:25

In terms of waking and sleeping, it's "Only if contention was detected does a system call (called futex) and context switch into the kernel occurs that puts the calling process to sleep until the mutex is released." – Recap 18/8, 2017 at 20:27

Blocking the thread is done with sleep() or wait() ... and then wake() can wake up the thread when a condition is met. – Recap 18/8, 2017 at 20:28

I understand perfectly well how the userland mutext implementation works and how futex works. Because futex was introduced in Linux 2.6 or something, and Linux was around for a decade before that I'm asking what syscall was used before that. – Foozle 18/8, 2017 at 20:30

man 2 sleep returns nothing. wait() doesn't seem applicable here - it waits for changes in the state of a child process. That doesn't seem helpful for arbitrary threads to wait or wake arbitrary other threads within a process. man 2 wake returns nothing. – Foozle 18/8, 2017 at 20:31

As I said, the kernel has had mutexes since before futex. Wait() will cause the process to wait for a signal. Historically, you change the state of the thread to a waiting state until you acquire the mutex (in userland) by setting the state to TASK_INTERRUPTIBLE or some similar state (in kernel land). Then when you acquire the mutex, you do what you want with the resource, release the mutex and carry on. – Recap 18/8, 2017 at 20:43

@Foozle wait is not outside the picture, "since Linux 2.4 a thread can, and by default will, wait on children of other threads in the same thread group." – Chantay 18/8, 2017 at 20:54

@direprobs - that doesn't seem to help, you still need a parent child relationship to use wait, and it only triggers on certain events like the termination of a child. Basically zero of the building blocks needed to allow thread A to block and be woken by B, and then later B to block and be woken by A. If you have evidence that pthreads or other libraries somehow hacked wait() to work for blocking and unblocking, it would make a good answer though! – Foozle 18/8, 2017 at 20:57

Here it describes the kernel process states. Again, the thread sets itself (potentially after being told to by the parent) to a wait state until the desired condition (a mutex lock is acquired, etc) is reached. The states as reported by ps are: ` R running or runnable (on run queue) D uninterruptible sleep (usually IO) S interruptible sleep (waiting for an event to complete) Z defunct/zombie, terminated but not reaped by its parent T stopped, either by a job control signal or it is being traced` – Recap 18/8, 2017 at 21:3

Once the condition is met, a signal can be emitted, shared memory can be modified, etc. Since the kernel has had mutex functionality for a very long time, mutexes are usually how this condition-matching is achieved. – Recap 18/8, 2017 at 21:5

P.S. the wake() function and it's sisters are how you change the state of the process. Try googling it instead of checking your man pages; you may not have all of them anyway (I don't) – Recap 18/8, 2017 at 21:10

Recommended topics

Hot tags