Linux's atomic<T>
unfortunately(?) doesn't alignas / pad up to a power-of-2 size. std::atomic<Something> arr[10]
has sizeof(arr) = 30. (https://godbolt.org/z/WzK66xebr)
Use struct Something { alignas(4) char a; char b,c; };
(Not alignas(4) char a,b,c;
because that would make each char padded to 4 bytes so they could each be aligned.)
Objects with a non-power-of-2 size might span a cache-line boundary so using a wider 4-byte load is not always possible.
Plus pure stores would always have to use a CAS (e.g. lock cmpxchg
) to avoid inventing writes to a byte outside the object: obviously you can't use two separate mov
stores (2-byte + 1-byte) because that wouldn't be atomic, unless you do that inside a TSX transaction with a retry loop.
x86 load/store are only guaranteed atomic for memory accesses that don't cross an 8-byte boundary. (On some vendors / uarches, a cache line boundary. Or for possibly-uncacheable loads/stores, basically natural alignment is what you need). Why is integer assignment on a naturally aligned variable atomic on x86?
Your struct Something { char a, b, c; };
has no alignment requirement so there's no C++ rule that prevents a Something
object from spanning 2 cache lines. That would make a plain-mov
load/store of it definitely non-atomic.
gcc and clang choose to implement atomic<T>
with the same layout / object-representation as T
(regardless of being lock-free or not). Therefore atomic<Something>
is a 3-byte object. An array of atomic<Something>
thus necessarily has some of those objects spanning cache line boundaries, and can't have padding outside the object because that's not how arrays work in C. sizeof()
= 3 tells you the array layout. This makes lock-free atomic<Something>
impossible. (Unless you load/store with lock cmpxchg
to be atomic even on cache-line splits, which would produce a huge performance penalty in the cases where that did happen. Better to make developers fix their struct.)
The atomic<T>
class can have a higher alignment requirement than T
, for example atomic<int64_t>
has alignof(atomic_int64_t) == 8, unlike alignof(int64_t) == 4
on many 32-bit platforms (including the i386 System V ABI).
If gcc/clang hadn't made the choice to keep the layout the same, they could have had atomic<T>
pad small objects up to the next power of 2 and add alignment so they could be lock-free. That would be a valid implementation choice. I can't think of any downsides.
Fun fact, gcc's C11 _Atomic
support is slightly broken on 32-bit platforms with 64-bit lockless atomics : _Atomic int64_t
can be misaligned inside structs leading to tearing. They still haven't updated the ABI for _Atomic
types to have natural alignment.
But g++'s C++11 std::atomic uses a template class in a header that fixed that bug a while ago (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65147); ensuring that atomic<T>
has natural alignment (up to some power of 2 size) even if T
has alignment < size. Thus there's no way they can span any boundary wider than they are.
§[atomics.types.generic]p3
allows this - The representation of an atomic specialization need not have the same size as its corresponding argument type. I guess there are portability problems with that though? – Hemphillsizeof(S)
is never 3. – Lylalylealignas(4)
. – Adley