Take this simple function that increments an integer under a lock implemented by std::mutex
:
#include <mutex>
std::mutex m;
void inc(int& i) {
std::unique_lock<std::mutex> lock(m);
i++;
}
I would expect this (after inlining) to compile in a straightforward way to a call of m.lock()
an increment of i
and then m.unlock()
.
Checking the generated assembly for recent versions of gcc
and clang
, however, we see an extra complication. Taking the gcc
version first:
inc(int&):
mov eax, OFFSET FLAT:__gthrw___pthread_key_create(unsigned int*, void (*)(void*))
test rax, rax
je .L2
push rbx
mov rbx, rdi
mov edi, OFFSET FLAT:m
call __gthrw_pthread_mutex_lock(pthread_mutex_t*)
test eax, eax
jne .L10
add DWORD PTR [rbx], 1
mov edi, OFFSET FLAT:m
pop rbx
jmp __gthrw_pthread_mutex_unlock(pthread_mutex_t*)
.L2:
add DWORD PTR [rdi], 1
ret
.L10:
mov edi, eax
call std::__throw_system_error(int)
It's the first couple of lines that are interesting. The assembled code examines the address of __gthrw___pthread_key_create
(which is the implementation for pthread_key_create
- a function to create a thread-local storage key), and if it is zero, it branches to .L2
which implements the increment in a single instruction without any locking at all.
If it is non-zero it proceeds as expected: locking the mutex, doing the increment, and unlocking.
clang
does even more: it checks the address of the function twice, once before the lock
and once before the unlock
:
inc(int&): # @inc(int&)
push rbx
mov rbx, rdi
mov eax, __pthread_key_create
test rax, rax
je .LBB0_4
mov edi, m
call pthread_mutex_lock
test eax, eax
jne .LBB0_6
inc dword ptr [rbx]
mov eax, __pthread_key_create
test rax, rax
je .LBB0_5
mov edi, m
pop rbx
jmp pthread_mutex_unlock # TAILCALL
.LBB0_4:
inc dword ptr [rbx]
.LBB0_5:
pop rbx
ret
.LBB0_6:
mov edi, eax
call std::__throw_system_error(int)
What's the purpose of this check?
Perhaps it is to support the case where the object file is ultimately complied into a binary without pthreads support and then to fall back to a version without locking in that case? I couldn't find any documentation on this behavior.
clang
has two checks: two checks is the default but gcc just managed to combine the checks into one, essentially compiling two different versions of the method, whileclang
wasn't able to combine the checks, at least at-O2
. – Flourish