Why is std::atomic<T>::is_lock_free() not static as well as constexpr?
Asked Answered
C

4

11

Can anyone tell me whether std::atomic<T>::is_lock_free() isn't static as well as constexpr? Having it non-static and / or as non-constexpr doesn't make sense for me.

Why wasn't it designed like C++17's is_always_lock_free in the first place?

Chaotic answered 12/11, 2019 at 10:0 Comment(4)
Are you aware of is_always_lock_free?Dempstor
I'm going to throw "alignment" out there.Blackthorn
@MaxLanghof Do you mean that not all instances are going to be aligned the same way?Gean
Mike, no, I wasn't aware, but thanks for this hint; it is really helpful for me. But I'm asking myself why there's a decision between is_lock_free() and is_always_lock_free. It can't be because of unaligned atomics, waht others suggested here, since the language defines unaligned accesses to have undefined behaviour anyway.Chaotic
B
13

As explained on cppreference:

All atomic types except for std::atomic_flag may be implemented using mutexes or other locking operations, rather than using the lock-free atomic CPU instructions. Atomic types are also allowed to be sometimes lock-free, e.g. if only aligned memory accesses are naturally atomic on a given architecture, misaligned objects of the same type have to use locks.

The C++ standard recommends (but does not require) that lock-free atomic operations are also address-free, that is, suitable for communication between processes using shared memory.

As mentioned by multiple others, std::is_always_lock_free might be what you are really looking for.


Edit: To clarify, C++ object types have an alignment value that restricts the addresses of their instances to only certain multiples of powers of two ([basic.align]). These alignment values are implementation-defined for fundamental types, and need not equal the size of the type. They can also be more strict than what the hardware could actually support.

For example, x86 (mostly) supports unaligned accesses. However, you will find most compilers having alignof(double) == sizeof(double) == 8 for x86, as unaligned accesses have a host of disadvantages (speed, caching, atomicity...). But e.g. #pragma pack(1) struct X { char a; double b; }; or alignas(1) double x; allows you to have "unaligned" doubles. So when cppreference talks about "aligned memory accesses", it presumably does so in terms of the natural alignment of the type for the hardware, not using a C++ type in a way that contradicts its alignment requirements (which would be UB).

Here is more information: What's the actual effect of successful unaligned accesses on x86?

Please also check out the insightful comments by @Peter Cordes below!

Blackthorn answered 12/11, 2019 at 10:6 Comment(5)
32-bit x86 is a good example of where you find ABIs with alignof(double)==4. But std::atomic<double> still has alignof() = 8 instead of checking alignment at runtime. Using a packed struct that under-aligns atomic breaks the ABI and is not supported. (GCC for 32-bit x86 prefers to give 8-byte objects natural alignment, but struct-packing rules override that and are only based on alignof(T), e.g. on i386 System V. G++ used to have a bug where atomic<int64_t> inside a struct might not be atomic because it just assumed. GCC (for C not C++) still has this bug!)Fingered
But a correct implementation of C++20 std::atomic_ref<double> will either reject under-aligned double entirely, or will check alignment at runtime on platforms where it's legal for plain double and int64_t to be less than naturally aligned. (Because atomic_ref<T> operates on an object that was declared as a plain T, and only has a minimum alignment of alignof(T) without the opportunity to give it extra alignment.)Fingered
See gcc.gnu.org/bugzilla/show_bug.cgi?id=62259 for the now-fixed libstdc++ bug, and gcc.gnu.org/bugzilla/show_bug.cgi?id=65146 for the still-broken C bug, including a pure ISO C11 testcase that shows tearing of an _Atomic int64_t when compiled with current gcc -m32. Anyway, my point is that real compilers don't support under-aligned atomics, and don't do runtime checks (yet?), so #pragma pack or __attribute__((packed)) will just lead to non-atomicity; objects will still report that they are lock_free.Fingered
But yes, the purpose of is_lock_free() is to allow implementations to work differently from the way current ones actually do; with runtime checks based on actual alignment to use HW-supported atomic instructions or to use a lock.Fingered
Correction to my earlier comments: in order to use std::atomic_ref on a double, you need to make sure it's properly aligned yourself, like alignas( std::atomic_ref<double>::required_alignment ) double foo. Otherwise it's UB, so atomic_ref doesn't have to check for alignment or handle under-aligned objects. en.cppreference.com/w/cpp/atomic/atomic_ref/required_alignmentFingered
P
4

You may use std::is_always_lock_free

is_lock_free depends on the actual system and can't be determined at compile time.

Relevant explanation:

Atomic types are also allowed to be sometimes lock-free, e.g. if only aligned memory accesses are naturally atomic on a given architecture, misaligned objects of the same type have to use locks.

Perugia answered 12/11, 2019 at 10:5 Comment(5)
std::numeric_limits<int>::max depends on the architecture, yet is is static and constexpr. I guess there is nothing wrong in the answer, but I dont buy the first part of the reasoningTectonics
Doesn't define the language unaligned accesses to have undefined behaviour anyway so that an evaluation of lock-free-ness or not at runtime would be nonsense?Chaotic
It doesn't make sense to decide between aligned and unaligned accesses as the language defines the latter as undefined behaviour.Chaotic
@BonitaMontero There is "unaligned in the C++ object alignment" sense and "unaligned in what the hardware likes" sense. Those are not necessarily the same, but in practice they frequently are. The example you show is one such instance where the compiler apparently has the built-in assumption that the two are the same - which only means that is_lock_free is pointless on that compiler.Blackthorn
You can be pretty sure that an atomic would have proper alignment if there is a alignment-requirement.Chaotic
C
1

I've got installed Visual Studio 2019 on my Windows-PC and this devenv has also an ARMv8-compiler. ARMv8 allows unaligned accesses, but compare and swaps, locked adds etc. are mandated to be aligned. And also pure load / pure store using ldp or stp (load-pair or store-pair of 32-bit registers) are only guaranteed to be atomic when they're naturally aligned.

So I wrote a little program to check what is_lock_free() returns for an arbitrary atomic-pointer. So here's the code:

#include <atomic>
#include <cstddef>

using namespace std;

bool isLockFreeAtomic( atomic<uint64_t> *a64 )
{
    return a64->is_lock_free();
}

And this is the disassembly of isLockFreeAtomic

|?isLockFreeAtomic@@YA_NPAU?$atomic@_K@std@@@Z| PROC
    movs        r0,#1
    bx          lr
ENDP

This is just returns true, aka 1.

This implementation chooses to use alignof( atomic<int64_t> ) == 8 so every atomic<int64_t> is correctly aligned. This avoids the need for runtime alignment checks on every load and store.

(Editor's note: this is common; most real-life C++ implementations work this way. This is why std::is_always_lock_free is so useful: because it's usually true for types where is_lock_free() is ever true.)

Chaotic answered 12/11, 2019 at 13:30 Comment(5)
Yes, most implementations choose to give atomic<uint64_t> and alignof() == 8 so they don't have to check alignment at runtime. This old API gives them the option of not doing so, but on current HW it makes much more sense just to require alignment (otherwise UB, e.g. non-atomicity). Even in 32-bit code where int64_t might only have 4-byte alignment, atomic<int64_t> requires 8-byte. See my comments on another answerFingered
Put into different words: If a compiler chooses to make alignof value for a fundamental type the same as the "good" alignment of the hardware, then is_lock_free will always be true (and so will is_always_lock_free). Your compiler here does exactly this. But the API exists so other compilers can do different things.Blackthorn
You can be pretty sure that if the language says that unaligned access has undefined behaviour all atomics have to be properly aligned. No implementation will do any runtime-checks because of that.Chaotic
@BonitaMontero Yes, but there is nothing in the language that forbids alignof(std::atomic<double>) == 1 (so there would be no "unaligned access" in the C++ sense, hence no UB), even if the hardware can only guarantee lock-free atomic operations for doubles on 4 or 8 byte boundaries. The compiler would then have to use locks in the unaligned cases (and return the appropriate boolean value from is_lock_free, depending on the memory location of the object instance).Blackthorn
@MaxLanghof: Yes, the non-static std::atomic<>::is_lock_free() API is designed to allow that implementation choice. It would be a bad choice for real-world implementations so that's not how they actually work. Calling it on a std::atomic<> object with less alignment than its alignof is already UB, so the fact that it still returns true is not a violation of anything, just means the API wasn't helpful for detecting that problem.Fingered
P
1

std::atomic<T>::is_lock_free() may in some implementations return true or false depending on runtime conditions.

As pointed out by Peter Cordes in comments, the runtime conditions is not alignment, as atomic will (over-)align internal storage for efficient lock-free operations, and forcing misalignment is UB that may manifest as loss of atomicity.

It is possible to make an implementation that will not enforce alignment and would do runtime dispatch based on alignment, but it is not what a sane implementation would do. It only make sense to support pre-C++17, if __STDCPP_DEFAULT_NEW_ALIGNMENT__ is less than required atomic alignment, as overalignment for dynamic allocation does not work until C++17.

Another reason where runtime condition may determine atomicity is runtime CPU dispatch.

On x86-64, an implementation may detect the presence of cmpxchg16b via cpuid at initialization, and use it for 128-bit atomics, the same applies to cmpxchg8b and 64-bit atomic on 32-bit. If corresponding cmpxchg is not found, lock-free atomic is unimplementable, and the implementation uses locks.

MSVC doesn't do runtime CPU dispatch currently. It doesn't do it for 64-bit due to ABI compatibility reasons, and doesn't do it for 32-bit as already doesn't support CPUs without cmpxchg8b. Boost.Atomic doesn't do this by default (assumes cmpxchg8b and cmpxhg16b presence), but can be configured for the detection. I haven't bothered to look what other implementations do yet.

Patrick answered 7/11, 2021 at 17:47 Comment(4)
The non-static std::atomic<>::is_lock_free() API does allow the possibility of an implementation with alignof(std::atomic<T>) less than sizeof. Current implementations choose to have alignof == sizeof so they don't need runtime alignment checks. (That means it's UB to call is_lock_free or any other member function on a misaligned atomic<T> object, so it doesn't matter what the return value is.) Anyway, that's an implementation choice, not a constraint of ISO C++11. (A good an obvious implementation choice, though!) Good point about runtime dispatch as another reason, though.Fingered
@PeterCordes, yes, corrected. On another thought I found a possible reason not to rely on alignment: before C++17 the alignment for new was fixed to __STDCPP_DEFAULT_NEW_ALIGNMENT__ and could not be increased by alignas. I don't think that some implementation uses smaller allocation alignment than required for largest lock-free atomic, but it looks like a reason to provide standard way to deal with this.Patrick
Interesting point about new. You could consider runtime alignment checks for the largest object size (especially if it required atomic RMW just to read) instead of just deciding it would be never lock_free, if new aligned less than that. Not the case on any mainstream x86 implementation, e.g. I think MSVC aligns by 16 on x86-64 (and GNU/Linux certainly does), and everything aligns by at least 8 in 32-bit mode. IDK what alignof(max_align_t) is on gcc for AArch64 / MIPS64 / PPC64. I think AArch64 would have 16-byte atomics baseline without even needing -march options, but prob. 16B newFingered
@PeterCordes, we know where to query this for many of the configurations godbolt.org/z/73z11c49ePatrick

© 2022 - 2024 — McMap. All rights reserved.