Using std::atomic with futex system call
Asked Answered
G

2

12

In C++20, we got the capability to sleep on atomic variables, waiting for their value to change. We do so by using the std::atomic::wait method.

Unfortunately, while wait has been standardized, wait_for and wait_until are not. Meaning that we cannot sleep on an atomic variable with a timeout.

Sleeping on an atomic variable is anyway implemented behind the scenes with WaitOnAddress on Windows and the futex system call on Linux.

Working around the above problem (no way to sleep on an atomic variable with a timeout), I could pass the memory address of an std::atomic to WaitOnAddress on Windows and it will (kinda) work with no UB, as the function gets void* as a parameter, and it's valid to cast std::atomic<type> to void*

On Linux, it is unclear whether it's ok to mix std::atomic with futex. futex gets either a uint32_t* or a int32_t* (depending which manual you read), and casting std::atomic<u/int> to u/int* is UB. On the other hand, the manual says

The uaddr argument points to the futex word. On all platforms, futexes are four-byte integers that must be aligned on a four- byte boundary. The operation to perform on the futex is specified in the futex_op argument; val is a value whose meaning and purpose depends on futex_op.

Hinting that alignas(4) std::atomic<int> should work, and it doesn't matter which integer type is it is as long as the type has the size of 4 bytes and the alignment of 4.

Also, I have seen many places where this trick of combining atomics and futexes is implemented, including boost and TBB.

So what is the best way to sleep on an atomic variable with a timeout in a non UB way? Do we have to implement our own atomic class with OS primitives to achieve it correctly?

(Solutions like mixing atomics and condition variables exist, but sub-optimal)

Genny answered 10/4, 2021 at 11:49 Comment(3)
WaitOnAddress is a limited implementation of a condition variable, atomicity is irrelevant. So instead of using atomics, why don't you try the classic condition variable from the standard library?Emcee
@Emcee throughput, mainly.Genny
WaitOnAddress has nothing to do with atomics and I am pretty sure won't give you any benefits compared to std::condition_variable. WaitOnAddress IS a condition variable by its semantic, it just hides the explicit mutex behind the scene. Besides that it does exactly the same.Emcee
P
6

You shouldn't necessarily have to implement a full custom atomic API, it should actually be safe to simply pull out a pointer to the underlying data from the atomic<T> and pass it to the system.

Since std::atomic does not offer some equivalent of native_handle like other synchronization primitives offer, you're going to be stuck doing some implementation-specific hacks to try to get it to interface with the native API.

For the most part, it's reasonably safe to assume that first member of these types in implementations will be the same as the T type -- at least for integral values [1]. This is an assurance that will make it possible to extract out this value.

... and casting std::atomic<u/int> to u/int* is UB

This isn't actually the case.

std::atomic is guaranteed by the standard to be Standard-Layout Type. One helpful but often esoteric properties of standard layout types is that it is safe to reinterpret_cast a T to a value or reference of the first sub-object (e.g. the first member of the std::atomic).

As long as we can guarantee that the std::atomic<u/int> contains only the u/int as a member (or at least, as its first member), then it's completely safe to extract out the type in this manner:

auto* r = reinterpret_cast<std::uint32_t*>(&atomic);
// Pass to futex API...

This approach should also hold on windows to cast the atomic to the underlying type before passing it to the void* API.

Note: Passing a T* pointer to a void* that gets reinterpreted as a U* (such as an atomic<T>* to void* when it expects a T*) is undefined behavior -- even with standard-layout guarantees (as far as I'm aware). It will still likely work because the compiler can't see into the system APIs -- but that doesn't make the code well-formed.

Note 2: I can't speak on the WaitOnAddress API as I haven't actually used this -- but any atomics API that depends on the address of a properly aligned integral value (void* or otherwise) should work properly by extracting out a pointer to the underlying value.


[1] Since this is tagged C++20, you can verify this with std::is_layout_compatible with a static_assert:

static_assert(std::is_layout_compatible_v<int,std::atomic<int>>);

(Thanks to @apmccartney for this suggestion in the comments).

I can confirm that this will be layout compatible for Microsoft's STL, libc++, and libstdc++; however if you don't have access to is_layout_compatible and you're using a different system, you might want to check your compiler's headers to ensure this assumption holds.

Phippen answered 21/4, 2021 at 14:1 Comment(10)
Interesting point about UB. It would be UB to deref an int* to an atomic<int> object without synchronization to make sure no other threads were reading/writing it at the time, but passing it to futex is basically like using an atomic_ref<int> operation on that int object - you're invoking machine code that's designed to be safe in the presence of other threads simultaneously reading+writing. As long as the atomic<T> is lock-free; maybe a static_assert on is_always_lock_free would be a good idea if you're also doing belt-and-suspenders stuff like alignas(4) std::atomic<int>.Nevile
I was thinking about suggesting the static_assert on is_always_lock_free -- but decided against it. Extending atomic in this way will pretty much always require some level of coupling with the std::atomic implementation -- and at which point there will likely have to be some knowledge of the implementation to really share this between the native system APIs. Also to the "UB to deref an int* ..." point -- technically it wouldn't be UB since the pointer is legal and valid. It just wouldn't be sequenced if it was accessed in a threaded context (which would make it UB)Phippen
That's exactly what I meant by "without synchronization...": you could create data-race UB, not strict-aliasing UB.Nevile
Re: Footnote [1] This can be verified programmatically. See std::is_layout_compatible. en.cppreference.com/w/cpp/types/is_layout_compatibleSuzette
@Suzette Excellent suggestion! I haven't fully explored all the new traits added in C++20, so thank you for bringing that to my attentionPhippen
static_assert(sizeof(int) == sizeof(std::atomic<int>) && std::atomic<int>::is_always_lock_free) is a reasonable best-effort check with C++17 but without C++20. If they're the same size and atomic_int is lock_free, that rules out most possible incompatibility.Nevile
According to GCC 12 (which apparently is the only compiler implementing this by now), it's not compatible: godbolt.org/z/1hvn1qa3T Then again, sizeof() and alignof() are equal and I'm not sure what exactly GCC is testing.Epiphenomenalism
Welp, this works: godbolt.org/z/ze4Wqrrn7 I guess, wg21.link/P0466R5 states "Two types cv1 T1 and cv2 T2 are layout-compatible types if T1 and T2 are the same type, layout-compatible enumerations, or layout-compatible standard-layout class types." and int is not a "standard-layout class type". Nothing seems to be stated about primitive types.Epiphenomenalism
@Epiphenomenalism Interesting observation on the layout compatibility! If that's the case, then it's possible the is_layout_compatible_v part of the suggestion is moot; but that shouldn't change the fact that reinterpret_casting to the first member of a standard-layout object is a legal operationPhippen
@Human-Compiler that's exactly what I would argue. The layout-compatibility is defined in the standard here: eel.is/c++draft/basic.types.general#11 The reinterpret_cast bit is valid because of this: eel.is/c++draft/class.mem#general-27 "Layout" in C++ appears to include the types of the members, not just size and alignment: eel.is/c++draft/class.mem#general-23. But even with that, I'd say that a standard-layout class with a single member could be layout-compatible with that single member. It's just not defined.Epiphenomenalism
D
1

You could use a "non-atomic" alignas(4) uint32_t variable with the futex calls, and perform other atomic operations on them via std::atomic_ref. See non-atomic operations on atomic variables and vice versa

Declivitous answered 8/9, 2022 at 7:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.