How are mutexes implemented?

Asked 28/9, 2009 at 8:1 Answered 2/1, 2019 at 9:6

Solved language-agnostic concurrency mutex

Are some implementations better than others for specific applications? Is there anything to earn by rolling out your own?

Lendlease answered 28/9, 2009 at 8:1 Comment(8)

"Is there anything to earn by rolling out your own?" Knowledge? – Sadomasochism 28/9, 2009 at 8:4

"Is there anything to earn by rolling out your own?" - yeah, flawed code! ;) – Alluvial 28/9, 2009 at 8:7

@Mitch Wheat - Certainly, one shouldn't be using homebrew mutex libraries for production code, but lots of people like to learn by doing, and writing your own [application x] is very informative. – Sadomasochism 28/9, 2009 at 8:15

Most people just use/inherit the mutex code from the kernel in an object of their own. Not many developers are looking deeper inside the mutex, fearing it's way more complex than a simple boolean field. And they're right! – Claudetta 28/9, 2009 at 8:58

Atomic CAS has to be done in the hardware, so truly rolling your own is impossible. – Mufti 28/9, 2009 at 19:8

a lot of people are mentioning test-and-set and compare-and-swap... but on many RISC architectures there is something called load-link store-conditional. a very interesting alternate way to implement atomic primitives on those CPUs, where you have a special load instruction that sets up a "reservation" and a store operation that can "fail". in between that you do computations using ordinary opcodes. if the store fails, you can assume there was a race and retry. – Plumbaginaceous 14/10, 2009 at 17:14

@MitchWheat Or fine control and predictability? – Lockhart 3/7, 2019 at 19:42

@curiousguy: that's fine for writing 'easy' stuff. But there are areas such as mutexes, parallel code, writing your own database etc., that are best left to experts (for obvious reasons). It's often referred to as 'Not invented here' – Alluvial 3/7, 2019 at 23:35

Check out the description of the Test-and-set machine instruction on Wikipedia, which alludes to how atomic operations are achieved at the machine level. I can imagine most language-level mutex implementations rely on machine-level support such as Test-and-set.

Bash answered 28/9, 2009 at 8:12 Comment(2)

On x86 for example, you can use an xchg instruction to atomically swap a register with memory. The store part is the "set", and the load part + branching on the register value is the "test" half of the test-and-set operation. And yes, this is more or less what you do in practice. See this minimal spinlock implementation in asm that does most of the important stuff except fall back to sleeping in a system call after spinning for a while without getting the lock. – Potter 1/11, 2017 at 16:32

Test and set alone only allow try-lock, not lock-or-wait. You need a syscall to actually suspend the thread without busy waiting. – Plymouth 18/10, 2023 at 11:18

Building on Adamski's test-and-set suggestion, you should also look at the concept of "fast user-space mutexes" or futexes.

Futexes have the desirable property that they do not require a kernel system call in the common cases of locking or unlocking an uncontended mutex. In these cases, the user-mode code successfully uses an atomic compare and swap (CAS) operation to lock or unlock the mutex.

If CAS fails, the mutex is contended and a kernel system call -- sys_futex under Linux -- must be used either to wait for the mutex (in the lock case) or to wake other threads (in the unlock case).

If you're serious about implementing this yourself, make sure you also read Ulrich Drepper's paper.

Lampert answered 14/10, 2009 at 17:5 Comment(0)

A mutex preferably runs in the kernel of the operating system while keeping the amount of code around it as short as possible, so it can avoid being cut-off while task-switching to another process. The exact implementation is therefore a bit of a secret. It's not complex though. It's basically an object that has a boolean field, which it gets and sets.

When using a counter, it can become a Semaphore.
A mutex is the starting point for a critical section, which uses a mutex internally to see if it can enter a section of code. If the mutex is free, it sets the mutex and executes the code, only to release the mutex when done. When a critical section notices that a mutex is locked, it can wait for the mutex to be released.

Around the basic mutex logic there are wrappers to wrap it in an object.. Then more wrapper objects to make it available outside the kernel. And then another wrapper to make it available in .NET. And then several programmers will write their own wrapper code around this all for their own logical needs. The wrappers around wrappers really make them a murky territory.

Now, with this basic knowledge about the internals of mutexes, all I hope is that you're going to use one implementation that relies on the kernel and the hardware underneath. These would be the most reliable. (If the hardware supports these.) If the mutex that you're using doesn't work at this kernel/hardware level then it can still be reliable but I would advise to not use it, unless there's no alternative.

As far as I know, Windows, Linux and .NET will all use mutexes at kernel/hardware level.

The Wikipedia page that I've linked to explains more about the internal logic and possible implementations. Preferably, a mutex is controlled by the hardware, thus making the whole getting/setting of the mutex an indivisible step. (Just to make sure the system doesn't switch tasks in-between.)

Claudetta answered 28/9, 2009 at 8:52 Comment(3)

What do you mean it is a secret? Isn't the entire linux kernel source code available in GitHub? – Oviparous 3/9, 2017 at 4:25

Oh, geez. It was 8 years ago when I wrote that! :) But yeah, it is a secret as no one really examines the source code for mutexes in the Linux kernel. And those who do check it will generally find deciphering the logic behind it difficult. See github.com/torvalds/linux/blob/master/kernel/locking/mutex.c for the Mutex code in Linux... Fortunately, it is well-commented. Still complex, though. As I said, a bit of secret... – Claudetta 6/10, 2017 at 1:59

Yes. And this great answer is a little bit of secret when it is under many other answers :D – Seventeen 1/10, 2019 at 2:52

A bit of assembly to demonstrate locking atomically:

; BL is the mutex id
; shared_val, a memory address

CMP [shared_val],BL ; Perhaps it is locked to us anyway
JZ .OutLoop2
.Loop1:
CMP [shared_val],0xFF ; Free
JZ .OutLoop1 ; Yes
pause ; equal to rep nop.
JMP .Loop1 ; Else, retry

.OutLoop1:

; Lock is free, grab it
MOV AL,0xFF
LOCK CMPXCHG [shared_val],BL
JNZ .Loop1 ; Write failed

.OutLoop2: ; Lock Acquired

Volotta answered 2/1, 2019 at 9:6 Comment(0)

Interlocked.CompareExchange is enough to implement spinlocks. It's pretty difficult to do right though. See for Joe Duffy's blog for an example of the subtleties involved.

Bangtail answered 28/9, 2009 at 8:33 Comment(2)

We're talking about language-agnostic solutions here, but thanks for your effort. – Lendlease 29/9, 2009 at 12:1

Oh, you're right. I don't know why I was thinking .NET. Perhaps because of the other answers. – Bangtail 29/9, 2009 at 14:21

I used Reflector.NET to decompile the source for System.Threading.ReaderWriterLockSlim, which was added to a recent version of the .NET framework.

It mostly uses Interlocked.CompareExchange, Thread.SpinWait and Thread.Sleep to achieve synchronisation. There are a few EventWaitHandle (kernel object) instances that are used under some circumstances.

There's also some complexity added to support reentrancy on a single thread.

If you're interested in this area and working in .NET (or at least, can read it) then you might find it quite interesting to check this class out.

Fanny answered 29/9, 2009 at 7:38 Comment(0)

Recommended topics

Hot tags