atomic<bool>
does everything atomic_flag
does, just as efficiently on all normal C++ implementations. C++20 just added new stuff to atomic_flag to bring it up to the level of atomic<bool>
. atomic_flag
is guaranteed to be lock_free, but in practice on all platforms anyone cares about, so is atomic<bool>
.
Don't expect GCC8 to have all the C++2a features; at least try it on https://godbolt.org/ with latest release or nightly gcc. (Also note that it's not the compiler proper that needs to support this, just the standard library headers. But libstdc++ is normally distributed with g++.)
I tweaked your example so it could be compiled with optimization enabled without optimizing away the actual work.
#include <atomic>
int flagtest(std::atomic_flag &myFlag) {
//std::atomic_flag myFlag = ATOMIC_FLAG_INIT;
return myFlag.test();
}
On the Godbolt compiler explorer with gcc and clang: GCC10.2 doesn't support the new C++20 atomic_flag::test()
member function, GCC nightly trunk build does. Clang 11.0 and trunk do, clang 10.0.1 doesn't.
# GCC trunk for x86-64 -O3 -std=gnu++2a
flagtest(std::atomic_flag&):
movzx eax, BYTE PTR [rdi]
ret
booltest(std::atomic<bool>&):
movzx eax, BYTE PTR [rdi]
test al, al
setne al
movzx eax, al # this is weird, GCC has gone insane.
ret
With clang, we can also try libc++ (a new implementation of the C++ standard library). By default, clang on Linux (including Godbolt) uses libstdc++, like GCC does.
# clang 11.0 -O3 -std=gnu++2a -stdlib=libc++
flagtest(std::__1::atomic_flag&):
mov al, byte ptr [rdi]
movzx eax, al
and eax, 1
ret
booltest(std::__1::atomic<bool>&):
mov al, byte ptr [rdi]
movzx eax, al
and eax, 1
ret
So that's weird and horrible; even if the value in memory might not be booleanized, there's no reason to merge into the low byte of RAX with a byte mov
and then movzx eax,al
. Just do a movzx
load in the first place! (Clang does have a tendency to be reckless with x86 false dependencies in general, but usually it at least saves a byte by using mov
instead of movzx
, if not a whole xor-zeroing instruction. But here it's costing an extra instruction.)
But and eax,1
is much less bad than GCC's insane test/setnz/movzx, if it thinks it needs to re-booleanize. (It doesn't actually need to do that; the ABI guarantees that a bool in memory is an actual 0
or 1
byte, and atomic<bool>
uses the same object-representation as bool
.)
So with clang, both ways have stupid missed-optimizations converting to int
. With GCC for some reason atomic_flag
doesn't suffer that problem, but I wouldn't recommend using it just for that reason. Hopefully atomic<bool>
will get fixed, and normally you don't convert bool to int.
Normal uses of atomic<bool>
or atomic_flag
, like branching on it, should not have any of these missed optimizations. e.g.
int g0, g1;
int conditional_load(std::atomic<bool> &myFlag) {
return myFlag ? g0 : g1;
}
# gcc 11 nightly build -O3
conditional_load(std::atomic<bool>&):
movzx eax, BYTE PTR [rdi]
test al, al
mov eax, DWORD PTR g0[rip]
cmove eax, DWORD PTR g1[rip]
ret
So that's pretty normal. Clang chooses to select between addresses, then load once. That puts the load-use latency on the critical path and takes more instructions; worse choice when both vars are adjacent so probably come from the same cache line. (GCC's choice always touches both vars, could be worse if one could stay "cold" in cache).