relaxed ordering as a signal
Asked Answered
S

1

5

Let's say we have two thread. One that give a "go" and one that wait a go to produce something.

Is this code correct or can I have an "infinite loop" because of cache or something like that?

std::atomic_bool canGo{false};

void producer() {
    while(canGo.load(memory_order_relaxed) == false);
    produce_data();
}

void launcher() {
    canGo.store(true, memory_order_relaxed);
}

int main() {
    thread a{producer};
    thread b{launcher};
}

If this code is not correct, is there a way to flush / invalidate the cache in standard c++?

Shriek answered 5/7, 2019 at 16:23 Comment(5)
I needed to refresh my knowledge of this. Very helpful: modernescpp.com/index.php/fences-as-memory-barriers modernescpp.com/index.php/acquire-release-fencesCarrington
Also I just recalled that you can get away with a lot on x86 just by using compiler barriers: bartoszmilewski.com/2008/11/05/…Carrington
Thanks :-). I think I need to protect the canGo variable with acquire release semanticShriek
Looks like a binary semaphore, producer is acquire, and launcher is release.Zicarelli
In any case, no need for a release during the store as there's nothing else happening in the launcher where ordering matters. In the producer, as per that bartos link above, if you are using x86, load/store or load/load will not be reordered in the CPU. So all you need is a compiler fence to stop the while loop being re-ordered below the produce_data (atomic_signal_fence). You can even sometimes get away with using standard variables (GASP!): preshing.com/20130618/atomic-vs-non-atomic-operations However using Atomic variables is always safe regarding multiple instructions per op.Carrington
H
5

A go signal like this will usually be in response to some memory changes that you'll want the target to see.

In other words, you'll usually want to give release/acquire semantics to such signaling.

That can be done either by using memory_order_release on the store and memory_order_acquire on the load, or by putting a release fence before the relaxed store and and an acquire fence after the relaxed load so that memory operations done by the signaller before the store are visible to the signallee (see for example, https://preshing.com/20120913/acquire-and-release-semantics/ or the C/C++ standard).


The way I remember the ordering of the fences is that, as far as I understand, shared memory operations among cores are effectively hardware implemented buffered IO that follows a protocol, and a release fence should sort of be like an output buffer flush and an acquire fence like an input buffer flush/sync.

Now if you flush your core's memory op output buffer before issuing a relaxed store, then when the target core sees the relaxed store, the preceding memory op messages must be available to it and all it needs to see those memory changes in its memory is to sync them in with an acquire fence after it sees the signalling store.

Heroin answered 5/7, 2019 at 16:46 Comment(12)
Maybe I forget something in the question. I know that I need acquire and release operation when I need something like consumer producer. However here I dont need to see other value than the canGo :-)Shriek
@AntoineMorrier Then you don't need the fences.Heroin
I thinkI need to protect the canGo variable with acquire release semantic to not have any problemShriek
@AntoineMorrier The atomic loads and store will be atomic (naturally) and the standard requires that the implementation make them visible "within a reasonable amount of time" (port70.net/~nsz/c/c11/n1570.html#7.17.3p16 for C. I'm sure C++ has something similar). Consequently it's impossible to encounter the canGo variable with a torn/trap value, and because the changes must propagate "within a reasonable amount of time" (practically <1µs), infinite looping is out of the question. That said a stricter acquire/release certainly shouldn't do any harm.Heroin
@AntoineMorrier I wouldn't even expect acquire/release fences here to even have much of a performance impact either. Especially around thread creation, which takes quite a few µs (around 20 on my Linux laptop).Heroin
I see. So if lets say, I have something like a if(canGo) instead of a while, and the variable is set by a user (via a gui) I can not have something like "if failing" because it reads trap data ?Shriek
@AntoineMorrier Trap/torn data is just a theoretical possibility on some architectures iff the variable isn't atomic. Yours is so you don't need to worry about that.Heroin
@PSkocik, you will need a read-acquire because that prevents any subsequent writes (in the produce_data() function for example) from being moved before the read-acquire. The write to canGo can be relaxed though as there are no ordering constraints, just inter-thread visibility.Hurlee
@Hurlee produce_data() should be control-dependency-ordered after the true value from the variable is read. I don't think you need the fences because of that but I do think they're a good idea nonetheless.Heroin
@PSkocik, would consume semantics on the load be more appropriate in that case?Hurlee
@Hurlee That would be useful if the dependency were a data dependency. But this would be a control dependency, and those, AFAIK, don't need to be explicitly ordered because, AFAIK, C/C++ ban speculative writes, which is exactly what would need to happen for produce_data() to move up.Heroin
@PSkocik, ah okay, yeah, makes sense that writes cannot be hoisted above the load because that would be speculative.Hurlee

© 2022 - 2024 — McMap. All rights reserved.