How is conditional_wait() implemented at the kernel and hardware/assembly level?

I understand that the thread that waits on a conditional variable, atomically releases the lock and goes to sleep until is waken by a conditional signal from another thread (when a particular condition is met). After it wakes up, it atomically re-acquires the lock (somehow magically) and updates as required and unlocks the critical section.

It would be great if someone could explain how this conditional_wait() procedure implemented at the kernel and the hardware/assembly level?

How is the lock released and re-acquired atomically? How does the kernel ensure it?

What does sleep here actually mean? Does it mean a context switch to another process/thread?

During thread sleep, how is this thread awaken by signaling implemented at the kernel level and if any hardware specific support is provided for these mechanisms?

Edit:

It seems "futex" is the guy managing this wait/signal stuff. To narrow down my question: How the futex system call for waiting and notifying condition variables is implemented/works at the low level?

On a high level (and since you are asking this question, high level is what you need) it is not that complicated. First, you need to know the layers of responsibility. There are basically 3 layers:

Hardware level - usually something which can be coded in a single ASM instruction
Kernel level - something which OS kernel does
Application level - something which application does

Generally, those responsibilities are not overlapping - kernel can not do what only hardware can do, hardware can not do what only kernel can do. Having this in mind, it is useful to remember that when it comes to locking, there is very little hardware knows about it. It pretty much boils down to

atomic arithmetics - hardware can lock a particular memory region (make sure no other threads access it), perform arithmetic operation on it and unlock the region. This can only work on the arithmetics natively supported by the chip (no square roots!) and on the sizes natively supported by hardware
Memory barriers or fences - that is, introduce a barrier within a flow of instructions, so that when CPU reorders instructions or uses memory caches, they will not cross those fences and the cache will be fresh
Conditional setting (compare-and-set) - set memory region to value A if it is B and report the status of this operation (was it set or not)

That's pretty much all CPU can do. As you see, there is no futex, mutex or conditional variables here. This stuff is made by kernel having CPU-supported operations at it's disposal.

Let's look in a very high level how kernel might implement futex call. Actually, futex is slightly complicated, because it is mixture of user-level calls and kernel-level calls as needed. Let's look into 'pure' mutex, implemented solely in kernel space. On a high level, it will be demonstrational enough.

When mutex is initially created, kernel associates a memory region with it. This region will hold a value of mutex being locked or unlocked. Later, kernel is asked to lock the mutex, it first instructs CPU to issue memory barrier. A mutex must serve as a barrier, so that everything read/written after mutex is acquired (or released) is visible to the rest of CPUs. Then, it uses CPU-supported compare-and-set instruction to set memory region value to 1 if it was set to 0. (there are more complicated reentrant mutexes, but let's not complicate the picture with them). It is guaranteed by CPU that even if more then one thread attempts to do this at the same time, only one will succeed. If the operation succeeds, we now 'hold the mutex'. Once kernel is asked to release the mutex, memory region is set to 0 (there is no need to do this conditionally, since we know we hold the mutex!) and another memory barrier is issued. Kernel also updates the mutex status in it's tables - see below.

If mutex locking fails, kernel adds the thread to it's tables which list threads waiting on particular mutex to be released. When the mutex is released, kernel checks what thread(s) are waiting on this mutex, and 'schedules' (i.e. prepares for execution) one of those (in case there is more than one, which one will be scheduled or awaken depends on multitude of factors, in the simplest case it is simply random). The thread scheduled begins executing, locks the mutex again (at this point it can fail again!) and the cycle of life continues.

Hope it does make at least half-sense :)

Recommended topics

Hot tags