Why don't standard libraries implement std::atomic for structs under 8 bytes in a lock-free manner?
Asked Answered
H

1

11

Assuming that the architecture can support 8 byte scalars in a lock free manner for std::atomic. Why don't standard libraries provide similar specializations for structs that are under 8 bytes?

A simple implementation of such an std::atomic specialization can just serialize/deserialize (with std::memcpy) the struct into the equivalent std::uintx_t where x is the width of the struct in bits (rounded off to the closest power of 2 that is larger than or equal to the width of the struct). This would be well defined because these structs are required by std::atomic to be trivially copyable.

Eg. https://godbolt.org/z/sxSeId, here Something is only 3 bytes, but the implementation calls __atomic_load and __atomic_exchange, both of which use a lock table.

Hemphill answered 28/4, 2019 at 22:2 Comment(10)
gcc gets it right if you make the struct 4 bytes (but not 3), see godbolt.org/z/d1OCmG. clang doesn't.Overliberal
@PaulSanders Interesting, I wonder why 3 bytes doesn't work..Hemphill
There is no x86 instruction that loads/stores 3 bytes, let alone atomically.Dorella
@Dorella Ah, sorry, but you could always take up more than the size up to the next power of 2 though right? Section §[atomics.types.generic]p3 allows this - The representation of an atomic specialization need not have the same size as its corresponding argument type. I guess there are portability problems with that though?Hemphill
@Dorella On most compilers the struct is padded up to 4 bytes. As in sizeof(S) is never 3.Lylalyle
But it could be 3 bytes, and then suddenly you require atomic access to 3 bytes, and you're in trouble.Skilling
@Curious: If you force the alignment of the struct to be 4, then it works fine even in GCC.Adley
@NicolBolas, it seems like clang doesn't godbolt.org/z/N6P6Hs. Any reason for this or is it just something that needs to be fixed?Hemphill
@Curious: When I said "force the alignment", I meant with alignas(4).Adley
@NicolBolas Ah interesting, any idea why a single byte alignment doesn't work? I didn't realize setting a higher alignment would make it use atomic instructions all of a sudden..Hemphill
C
6

Linux's atomic<T> unfortunately(?) doesn't alignas / pad up to a power-of-2 size. std::atomic<Something> arr[10] has sizeof(arr) = 30. (https://godbolt.org/z/WzK66xebr)


Use struct Something { alignas(4) char a; char b,c; };
(Not alignas(4) char a,b,c; because that would make each char padded to 4 bytes so they could each be aligned.)

Objects with a non-power-of-2 size might span a cache-line boundary so using a wider 4-byte load is not always possible.

Plus pure stores would always have to use a CAS (e.g. lock cmpxchg) to avoid inventing writes to a byte outside the object: obviously you can't use two separate mov stores (2-byte + 1-byte) because that wouldn't be atomic, unless you do that inside a TSX transaction with a retry loop.


x86 load/store are only guaranteed atomic for memory accesses that don't cross an 8-byte boundary. (On some vendors / uarches, a cache line boundary. Or for possibly-uncacheable loads/stores, basically natural alignment is what you need). Why is integer assignment on a naturally aligned variable atomic on x86?

Your struct Something { char a, b, c; }; has no alignment requirement so there's no C++ rule that prevents a Something object from spanning 2 cache lines. That would make a plain-mov load/store of it definitely non-atomic.

gcc and clang choose to implement atomic<T> with the same layout / object-representation as T (regardless of being lock-free or not). Therefore atomic<Something> is a 3-byte object. An array of atomic<Something> thus necessarily has some of those objects spanning cache line boundaries, and can't have padding outside the object because that's not how arrays work in C. sizeof() = 3 tells you the array layout. This makes lock-free atomic<Something> impossible. (Unless you load/store with lock cmpxchg to be atomic even on cache-line splits, which would produce a huge performance penalty in the cases where that did happen. Better to make developers fix their struct.)

The atomic<T> class can have a higher alignment requirement than T, for example atomic<int64_t> has alignof(atomic_int64_t) == 8, unlike alignof(int64_t) == 4 on many 32-bit platforms (including the i386 System V ABI).

If gcc/clang hadn't made the choice to keep the layout the same, they could have had atomic<T> pad small objects up to the next power of 2 and add alignment so they could be lock-free. That would be a valid implementation choice. I can't think of any downsides.


Fun fact, gcc's C11 _Atomic support is slightly broken on 32-bit platforms with 64-bit lockless atomics : _Atomic int64_t can be misaligned inside structs leading to tearing. They still haven't updated the ABI for _Atomic types to have natural alignment.

But g++'s C++11 std::atomic uses a template class in a header that fixed that bug a while ago (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65147); ensuring that atomic<T> has natural alignment (up to some power of 2 size) even if T has alignment < size. Thus there's no way they can span any boundary wider than they are.

Caress answered 3/8, 2019 at 7:30 Comment(1)
Oh the cacheline splits actually make a ton of sense, thanks!Hemphill

© 2022 - 2024 — McMap. All rights reserved.