How is conditional_wait() implemented at the kernel and hardware/assembly level?
Asked Answered
M

1

19

I understand that the thread that waits on a conditional variable, atomically releases the lock and goes to sleep until is waken by a conditional signal from another thread (when a particular condition is met). After it wakes up, it atomically re-acquires the lock (somehow magically) and updates as required and unlocks the critical section.

It would be great if someone could explain how this conditional_wait() procedure implemented at the kernel and the hardware/assembly level?

How is the lock released and re-acquired atomically? How does the kernel ensure it?

What does sleep here actually mean? Does it mean a context switch to another process/thread?

During thread sleep, how is this thread awaken by signaling implemented at the kernel level and if any hardware specific support is provided for these mechanisms?

Edit:

It seems "futex" is the guy managing this wait/signal stuff. To narrow down my question: How the futex system call for waiting and notifying condition variables is implemented/works at the low level?

Morganite answered 30/10, 2015 at 8:50 Comment(10)
On a Linux system the source, from the standard library level down to the kernel, is all open and freely available. It will take some time to sift through but it's not impossible.Vernalize
On Linux systems, the liaison between user space and kernel is usually done with a tool called futex. It combines atomic updates with kernel organized waiting. You should be able to find a lot of documentation about futexes.Ier
Where does it state that reacquiring the lock after waiting is atomic? The manpage certainly doesn't.Fredel
@EOF en.cppreference.com/w/cpp/thread/condition_variableMorganite
@Shyam: That's about c++, not c.Fredel
@Morganite Any introductory textbook on operating systems will cover this. I suggest Silberschatz's "Operating System Concepts". Older editions are available used nearly for free.Goebbels
@EOF : Okay, but lock release are atomic in c I guess. I want to know how is this release and sleep atomic?Morganite
@Shyam: In Linux API for condition variables is implemented in glibc library under nptl/ directory. It uses futex system call for waiting and notifying condition variables.That system call is implemented in the kernel's kernel/futex.c source file. Specify in your question, which part you cannot understand: condition_variable in glibc, futex usage or futex system call implementation in the kernel. Otherwise your question is too broad for SO.Ape
@Ape Thanks for your reply. I dint know it uses futex system call. I want to know how the futex system call for waiting and notifying condition variables is implemented/works at the low level to accomplish this task. I will update my post.Morganite
There is nice explanation of futex_wait/futex_notify mechanism at the beginning of the kernel/futex.c. Also, futex_wait contains comments describing every step. What exactly you failed to understand? Note, that in-kernel waiting for condition is very similar to wait_event() mechanism. This mechanism is described in many books about linux kernel programming.Ape
M
20

On a high level (and since you are asking this question, high level is what you need) it is not that complicated. First, you need to know the layers of responsibility. There are basically 3 layers:

  • Hardware level - usually something which can be coded in a single ASM instruction
  • Kernel level - something which OS kernel does
  • Application level - something which application does

Generally, those responsibilities are not overlapping - kernel can not do what only hardware can do, hardware can not do what only kernel can do. Having this in mind, it is useful to remember that when it comes to locking, there is very little hardware knows about it. It pretty much boils down to

  • atomic arithmetics - hardware can lock a particular memory region (make sure no other threads access it), perform arithmetic operation on it and unlock the region. This can only work on the arithmetics natively supported by the chip (no square roots!) and on the sizes natively supported by hardware
  • Memory barriers or fences - that is, introduce a barrier within a flow of instructions, so that when CPU reorders instructions or uses memory caches, they will not cross those fences and the cache will be fresh
  • Conditional setting (compare-and-set) - set memory region to value A if it is B and report the status of this operation (was it set or not)

That's pretty much all CPU can do. As you see, there is no futex, mutex or conditional variables here. This stuff is made by kernel having CPU-supported operations at it's disposal.

Let's look in a very high level how kernel might implement futex call. Actually, futex is slightly complicated, because it is mixture of user-level calls and kernel-level calls as needed. Let's look into 'pure' mutex, implemented solely in kernel space. On a high level, it will be demonstrational enough.

When mutex is initially created, kernel associates a memory region with it. This region will hold a value of mutex being locked or unlocked. Later, kernel is asked to lock the mutex, it first instructs CPU to issue memory barrier. A mutex must serve as a barrier, so that everything read/written after mutex is acquired (or released) is visible to the rest of CPUs. Then, it uses CPU-supported compare-and-set instruction to set memory region value to 1 if it was set to 0. (there are more complicated reentrant mutexes, but let's not complicate the picture with them). It is guaranteed by CPU that even if more then one thread attempts to do this at the same time, only one will succeed. If the operation succeeds, we now 'hold the mutex'. Once kernel is asked to release the mutex, memory region is set to 0 (there is no need to do this conditionally, since we know we hold the mutex!) and another memory barrier is issued. Kernel also updates the mutex status in it's tables - see below.

If mutex locking fails, kernel adds the thread to it's tables which list threads waiting on particular mutex to be released. When the mutex is released, kernel checks what thread(s) are waiting on this mutex, and 'schedules' (i.e. prepares for execution) one of those (in case there is more than one, which one will be scheduled or awaken depends on multitude of factors, in the simplest case it is simply random). The thread scheduled begins executing, locks the mutex again (at this point it can fail again!) and the cycle of life continues.

Hope it does make at least half-sense :)

Mccurdy answered 30/10, 2015 at 14:20 Comment(3)
Thanks so much for your detailed answer!! Makes things much clear about locking. I have few more clarifications. What does sleep after unlock actually mean? Does it mean a switch to another process/thread by the kernel? how is this thread awaken by signalling? What does signalling the sleeping thread mean? How is it implemented.Morganite
sleep means that thread issues a call to a variant of sleep (say, nanosleep) function, supported by kernel. Upon seeing that call, scheduler takes the thread issued the call from execution, and schedules next execution to take place after sleep timeout lapses. Note, unless you are talking about true real time system, the thread won't be executed EXACTLY after timeout. It will be executed not before the timeout! Signal is an entry in signal tables inside kernel. Once kernel sees it, it calls signal handling function defined in the thread.Mccurdy
This answer addresses the strategy to implement a lock (mutex). It does not address what the original question was asking (condition variables).Wheelbarrow

© 2022 - 2024 — McMap. All rights reserved.