When do I really need to use atomic<bool> instead of bool? [duplicate]
Asked Answered
H

6

140

Isn't atomic<bool> redundant because bool is atomic by nature? I don't think it's possible to have a partially modified bool value. When do I really need to use atomic<bool> instead of bool?

Helbonna answered 1/5, 2013 at 15:12 Comment(10)
You need atomic<bool> to avoid race-conditions. A race-condition occurs if two threads access the same memory location, and at least one of them is a write operation. If your program contains race-conditions, the behavior is undefined.Peonage
@nosid: Yes, but what the OP is saying is that he doesn't believe that you can have a partial write operation on a bool like you can, say an int value where you are copying each byte or word of that value individually. There therefore shouldn't be any race condition, if the write is already atomic.Maddy
Related: #5067992Elaelaborate
Without atomic there is no guarantee that you'll ever see the update in the other thread at all, or that you'll see updates to variables in the same order that you make them in a different thread.Lychnis
Related: Do I have to use atomic<bool> for “exit” bool variable?Claude
@jcoder: To be super pedantic, I believe the Standard doesn't actually mandate cache coherency (or rather "visibility propagation") -- it's left as a "best effort quality of implementation". That is, you can have an atomic variable synchronize two threads, but there is no guarantee (in the Standard) that a change ever propagates. It's only that if the change propagates, then it transfers the happens-before relationship. (For example, thread A could store "unlocked", but thread B could forever continue to read "locked" Only if it reads "unlocked" will it proceed safely.)Greenleaf
@KerrekSB isn't that exactly what the store and load functions std::atomic are for?Lychnis
@jcoder: I'm not sure. Store and load (with the relevant orderings!) are synchronisation points. That means that if you load a certain value, then you know that the store of that value has happened. But there's no guarantee that you eventually load the stored value. You might also forever continue loading the old value. ("Exchange" would be different, though, and necessarily have to propagate.)Greenleaf
One of the main reasons to use atomics is to suppress local-caching optimization of variable state. There's nothing that guarantees a global variable or class member set in one thread will be seen in another thread that is doing "while (condition) ..." In this use, they replace the poorly defined volatile keyword with precise semantics.Muldon
then why the cpp reference for cv says "Even if the shared variable is atomic, it must be modified while owning the mutex to correctly publish the modification to the waiting thread." @LychnisPoock
G
129

No type in C++ is "atomic by nature" unless it is an std::atomic*-something. That's because the standard says so.

In practice, the actual hardware instructions that are emitted to manipulate an std::atomic<bool> may (or may not) be the same as those for an ordinary bool, but being atomic is a larger concept with wider ramifications (e.g. restrictions on compiler re-ordering). Furthermore, some operations (like negation) are overloaded on the atomic operation to create a distinctly different instruction on the hardware than the native, non-atomic read-modify-write sequence of a non-atomic variable.

Greenleaf answered 1/5, 2013 at 15:23 Comment(3)
small correction, std::atomic_flag is the only exception, although its name start with atomic also.Dercy
@yngccc: I think that's why Kerrek SB wrote std::atomic* and not std::atomic<*>.Corrade
this std::atomic* includes std::atomic<*> ?Mill
J
101

Remember about memory barriers. Although it may be impossible to change bool partially, it is possible that multiprocessor system has this variable in multiple copies and one thread can see old value even after another thread has changed it to new. Atomic introduces memory barrier, so it becomes impossible.

Jacklynjackman answered 1/5, 2013 at 15:36 Comment(11)
can keyword volatile fix the multiprocessor issue?Belaud
No. Volatile has nothing to do with memory fences.Holophytic
Just for clarity's sake. @Vincent's comment may have originated from an understanding of the keyword volatile in Java. The volatile keyword in Java does control memory fences but has a very different behavior than the volatile keyword in C which does not. This question explains the difference further.Hydracid
Why is atomicity tied to memory ordering? Does std::atomic<T> imply barriers? If so isn't that going a bit further than merely atomic?Selig
https://mcmap.net/q/19948/-can-a-bool-read-write-operation-be-not-atomic-on-x86-duplicate Yeah, turns out std::atomic<T> does more than what it says on the tin. Of course it does.Selig
I think that's real correct answer. Because the answer about "standards bla-bla-bla... sizeof(bool) can be > 1 " is something that never happens in real life. All major compilers have sizeof(bool) == 1 and all read/write operations will work in a similar way for bool and atomic<bool>. But multi-core CPU and missed memory barrier is something that will happen with nearly 100% chance for any modern application and hardwareParticiaparticipant
@nmr: atomicity is tied to ordering so you can use it to create synchronization between threads. If you don't need that, use std::memory_order_relaxed to get atomicity without ordering. This answer is totally wrong: barriers don't create atomicity because for example they don't stop a store from another thread from appearing between tmp=var; tmp++; var=tmp;. Special CPU instructions are needed to make that sequence into an atomic RMW. See also Can num++ be atomic for 'int num'?Remscheid
@Dims: please delete this answer and stop spreading this misconception about barriers and how cache coherency works. If you want to say that atomic defaults to sequential-consistency ordering, say that. That's not required for atomicity, and conflicting values in cache for the same variable aren't possible: MESI cache coherence prevents that. atomic implies some of the same things as volatile so the compiler doesn't hoist the variable's value into a register, though. That isn't coherent.Remscheid
@PeterCordes I didn't say barriers imply atomicity, please re-read answer.Jacklynjackman
Oh right. But what you did say is still wrong. Barriers aren't needed to make a store globally visible. You only need them if you need this thread to wait until after that happens on its own. In theory you could have an inefficient C++ implementation on a system with non-coherent shared memory, but normally you use MPI or other message-passing for communication between coherency domains in the very rare huge clusters with some shared but not coherent memory. What atomic<T> really does on normal systems is stop the compiler from keeping the value in a thread-private registerRemscheid
Memory barriers aren't related to multithreadingAgee
C
39

C++'s atomic types deal with three potential problems. First, a read or write can be torn by a task switch if the operation requires more than one bus operation (and that can happen to a bool, depending on how it's implemented). Second, a read or write may affect only the cache associated with the processor that's doing the operation, and other processors may have a different value in their cache. Third, the compiler can rearrange the order of operations if they don't affect the result (the constraints are a bit more complicated, but that's sufficient for now).

You can deal with each of these three problems on your own by making assumptions about how the types you are using are implemented, by explicitly flushing caches, and by using compiler-specific options to prevent reordering (and, no, volatile doesn't do this unless your compiler documentation says it does).

But why go through all that? atomic takes care of it for you, and probably does a better job than you can do on your own.

Centiare answered 1/5, 2013 at 16:18 Comment(21)
Task switches don't cause tearing unless it took multiple instructions to store the variable. Whole instructions are atomic wrt. interrupts on a single core (they either fully complete before the interrupt, or any partial work is discarded. This is part of what store buffers are for.) Tearing is far more likely between threads on separate cores that are actually running simultaneously, because then yes you can get tearing between the parts of a store done by one instruction, e.g. an unaligned store or one too wide for the bus.Remscheid
No, a core can't write a cache line until it has exclusive ownership of that line. The MESI cache coherency protocol ensures this. (See Can num++ be atomic for 'int num'?). The real problem for C++ is that the compiler is allowed to assume that non-atomic variables aren't changed by other threads, so it can hoist loads out of loops and keep them in registers or optimize away. e.g. turning while(!var) {} into if(!var) infloop();. This part of atomic is similar to what volatile does: always re-read from memory (which is cached but coherent).Remscheid
@PeterCordes — I don’t have the wisdom to make assertions about the behavior of every possible hardware architecture that C++ code could be run on. Maybe you do, but that doesn’t mean you should resurrect a six-year old thread.Centiare
To roll your own atomics, you don't need to flush caches; you would use volatile + barriers. And you'd need inline asm for RMW atomics like var += 1; to be a single atomic increment instead of an atomic load, increment inside the CPU, then a separate atomic store.Remscheid
I was simplifying in my comment to talk about normal machines: sure it's possible to have a C++ implementation on a machine that requires explicit flushing for coherency, but the C++ memory model and the concept of release-stores is only efficient with coherent memory. Otherwise every release-store or seq-cst store would have to flush everything, barring clever as-if optimizations. All mainstream SMP systems are cache-coherent. There are non-coherent big clusters with shared memory, but they use that for message passing not for running threads of a single program.Remscheid
@PeterCordes — “simplifying to talk about normal machines” means that your comments do not address the meaning a “atomic” in the C++ standard, which describes requirements for implementations on any machine. “There are more things in heaven and Earth ... than we dreamt of in your philosophy.”Centiare
Your answer introduced discussion of implementation details. There's lots of possible gotchas you could make up if you want to invent hypothetical hardware. But yeah, the language in this answer doesn't go as far as implying that's an issue on normal hardware, unlike your answer on Can a bool read/write operation be not atomic on x86? (one of the duplicates of this question, but which is tagged x86 and thus can't have non-coherent caches across threads).Remscheid
An efficient C++ implementation on a machine that required explicit coherency sounds unlikely, so it's a weird one to make up when keeping values in registers produces the same problem you're talking about via a mechanism that does exist on all real CPUs. What bugs me about this answer is that it's not helping to clear up the common misconception about cache coherency in the real systems we do use. Many people think that explicit flushing of some kind is necessary on x86 or ARM, and that reading stale data from cache is possible.Remscheid
If the C++ standard cared at all about efficiency on non-coherent shared memory running multiple threads, there'd be mechanisms like release-stores that only made a certain array or other object globally visible, not every other operation before that point (including all non-atomic ops). On coherent systems, release stores just have to wait for preceding in-flight loads/stores to complete and commit, not write back the whole contents of any private caches. Access to our dirty private caches by other cores happens on demand.Remscheid
@PeterCordes — this answer wasn’t intended to address cache coherency in systems most people use. It was intended to suggest that that’s irrelevant if you use C++ atomics, since the implementation will handle whatever issues are present on the target hardware. But you obviously are impervious to the implications of writing a standard, and I’m not going to waste any more time trying to educate you.Centiare
I think you're missing the point of my comments. I know the C++ standard is written in a hardware-agnostic way, and that's definitely a good thing. But this answer starts out by saying "C++'s atomic types deal with three potential problems", so you're claiming that you're going to cover every possible hardware detail that might be a problem for C++ atomic, and that there are only 3 of them. ISO C++ doesn't even mention caches; that's on you so I think it's fair to criticise your choice of what to talk about as far as caches. You're not technically wrong, just IMO misleading.Remscheid
OTOH you have convinced me that fearmongering about non-coherent caching actually makes some sense here: it's something that atomic<bool> will take care of for you "if it's an issue on the target system". Even though it isn't on any C++ implementation I'm aware of, if the reader didn't know that then they definitely aren't ready to roll their own atomics on top of volatile or compiler-specific memory barriers.Remscheid
std::atomic<bool> gives you at least 2 other things you didn't mention: well-defined behaviour if another thread changes a value you're reading in a loop. (So it has that in common with volatile: force a re-read from memory). And make read-modify-write operations like b ^= 1; atomic. Except atomic<bool> doesn't have a negate function, but there is b.compare_exchange_weak or .exchange which are atomic. e.g. on x86 you get lock cmpxchg instead of just load/branch or whatever. How to atomically negate an std::atomic_bool?Remscheid
@PeterCordes - How does it give you well defined behavior for reading the volatile in a loop? As far as I know that's a common misconception: that forces an up-to-date read hence solving the "non-volatile loop bool" problem - but as far as I know the standard makes no guarantees here. It is hard to see how such guarantees would be written in any case, since the model is largely about relative behavior in the "happens before" style and makes no reference to a global clock (AFAIK).Orel
@BeeOnRope: volatile does not give you well-defined behaviour for this in ISO C++. Only on specific implementations (like GNU C for a known set of ISAs) can you usefully roll your own atomics on top of volatile, ignoring the fact that it's technically UB, like the Linux kernel. I should have said and instead of or implementation-defined stuff. I think in practice you'd be hard-pressed to find an implementation where volatile would break for this; like I said I don't think there are any C++ implementations on non-coherent shared memory hardware, and that's highly non-standard.Remscheid
Sorry @PeterCordes, I was talking about atomic not volatile. My claim is that atomic doesn't give you the behavior that one thread reading a variable will see the new value after another thread writes it, in theory. In practice it does, as a QoI issue and because the optimization that would break this is somewhat unlikely.Orel
@BeeOnRope: That was the design-intent of the standard I think, while still allowing optimization of atomics in some cases. Yes, optimization of atomics is a thorny problem. But I don't think you can ever justify hoisting a relaxed-atomic load out of a spin-wait loop according to any sane reading of the as-if rule. Any real compiler target will have some kind of maximum plausible reordering timespan, and it will be less than infinity. So assuming that all infinity of the reads (including the first one) happened before a write isn't sane.Remscheid
The language isn't written in those terms though ("hoisting", "reordering", etc). You don't need to look for the reasons why such an optimization would be allowed via "as if", because the base case doesn't guarantee this. You need to look for any language which suggests that a write by one thread is guaranteed to be seen by another thread, ever. As far as I know, there isn't. So it's not a question of optimization breaking something otherwise guaranteed in the standard: AFAIK it's not guaranteed at all. @PeterCordesOrel
@BeeOnRope: But actually the reasoning about possible orderings should be about orderings allowed in the C++ abstract machine, and then picking one such ordering at compile time. So oops, the target reality doesn't actually come into it that early.Remscheid
@BeeOnRope: Good point. For if(!b) infloop(); to be equivalent according to the as-if rule to while(!b){}, you'd have to decide that all infinity of the reads of b are contiguous in the global order of reads and writes for b. i.e. that they all happen-before any possible write from another thread. I guess that's possible in theory for a DeathStation 9000 implementation, but very obviously isn't the intent of the standard. It might not even be standard-compliant depending on the order of the program starting its threads.Remscheid
@BeeOnRope: There is language in a footnote/guideline in the standard that says implementations should ensure that even relaxed-atomic stores are promptly visible to all threads. 32.4.12 Implementations should make atomic stores visible to atomic loads within a reasonable amount of time. open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4713.pdf But you're right it doesn't actually guarantee it, I was forgetting that. So I guess this answer is doubly wrong, because atomic<> in the ISO standard doesn't quite guarantee cache flushing on a non-coherent system.Remscheid
I
34

Consider a compare and exchange operation:

bool a = ...;
bool b = ...;

if (a)
    swap(a,b);

After we read a, we get true, another thread could come along and set a false, we then swap (a,b), so after exit b is false, even though the swap was made.

Using std::atomic::compare_exchange we can do the entire if/swap logic atomically such that the other thread could not set a to false in between the if and the swap (without locking). In such a circumstance if the swap was made than b must be false on exit.

This is just one example of an atomic operation that applies to a two value type such as bool.

Inglorious answered 1/5, 2013 at 15:38 Comment(1)
How come this is the lowest rated answer? This (or test_and_set in std::atomic_flag) is the main reason to use an atomic bool type.Ritual
I
22

Atomic operations are about more than just torn values, so while I agree with you and other posters that I am not aware of an environment where torn bool is a possibility, there is more at stake.

Herb Sutter gave a great talk about this which you can view online. Be warned, it is a long and involved talk. Herb Sutter, Atomic Weapons. The issue boils down to avoiding data races because it allows you to have the illusion of sequential consistency.

Infection answered 1/5, 2013 at 15:29 Comment(0)
H
11

Atomicity of certain types depends exclusively on the underlying hardware. Each processor architecture has different guarantees about atomicity of certain operations. For example:

The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will always be carried out atomically:

  • Reading or writing a byte
  • Reading or writing a word aligned on a 16-bit boundary
  • Reading or writing a doubleword aligned on a 32-bit boundary

Other architectures have different specifications on which operations are atomic.

C++ is a high-level programming language that strives to abstract you from the underlying hardware. For this reason standard simply cannot permit one to rely on such low-level assumptions because otherwise your application wouldn't be portable. Accordingly, all the primitive types in C++ are provided with atomic counterparts by C++11 compliant standard library out-of-the-box.

Hosier answered 1/5, 2013 at 15:35 Comment(1)
Another critical part is that C++ compilers are normally allowed to keep variables in registers or optimize away accesses, because they can assume that no other threads are changing the value. (Because of data-race UB). atomic sort of includes this property of volatile, so while(!var){} can't optimize into if(!var) infinite_loop();. See MCU programming - C++ O2 optimization breaks while loopRemscheid

© 2022 - 2024 — McMap. All rights reserved.