Is volatile bool for thread control considered wrong?

Asked 9/8, 2011 at 11:17 Answered 18/4, 2023 at 3:40

Solved c++multithreading multicore volatile

As a result of my answer to this question, I started reading about the keyword volatile and what the consensus is regarding it. I see there is a lot of information about it, some old which seems wrong now and a lot new which says it has almost no place in multi-threaded programming. Hence, I'd like to clarify a specific usage (couldn't find an exact answer here on SO).

I also want to point out I do understand the requirements for writing multi-threaded code in general and why volatile is not solving things. Still, I see code using volatile for thread control in code bases I work in. Further, this is the only case I use the volatile keyword as all other shared resources are properly synchronized.

Say we have a class like:

class SomeWorker
{
public:
    SomeWorker() : isRunning_(false) {}
    void start() { isRunning_ = true; /* spawns thread and calls run */ }
    void stop() { isRunning_ = false; }

private:
    void run()
    {
        while (isRunning_)
        {
            // do something
        }
    }
    volatile bool isRunning_;
};

For simplicity some things are left out, but the essential thing is that an object is created which does something in a newly spawned thread checking a (volatile) boolean to know if it should stop. This boolean value is set from another thread whenever it wants the worker to stop.

My understanding has been that the reason to use volatile in this specific case is simply to avoid any optimization which would cache it in a register for the loop. Hence, resulting in an infinite loop. There is no need to properly synchronize things, because the worker thread will eventually get the new value?

I'd like to understand if this is considered completely wrong and if the right approach is to use a synchronized variable? Is there a difference between compiler/architecture/cores? Maybe it's just a sloppy approach worth avoiding?

I'd be happy if someone would clarify this. Thanks!

EDIT

I'd be interested to see (in code) how you choose to solve this.

Magnetochemistry answered 9/8, 2011 at 11:17 Comment(26)

not according to what I've read lately and not according to the discussion around the question I linked to. To me it seems things have changed at some point and this way to approach things is no longer considered a good way. I'd like to get a confirmation and explanation to this. :) – Magnetochemistry 9/8, 2011 at 11:28

Even ignoring volatile, the code above has a race condition. Calling stop() then start() in quick succession may result in more than one thread running at the same time. Whether that's a bug or not is a design question. – Fermata 9/8, 2011 at 11:42

yes, but as you stated this can be a design question. if not, see my comment for this case at @eran's answer. – Foetus 9/8, 2011 at 11:47

It is fine in this very specific case. Don't go jumping to conclusions from it, volatile is not a substitute for an event, nor is it suitable for implementing locks. – Pteridology 9/8, 2011 at 11:51

@Hans: are you saying the use of volatile bool is alright in this specific case? – Magnetochemistry 9/8, 2011 at 11:54

Yes, that's what "it is fine in this very specific case" means. – Pteridology 9/8, 2011 at 11:55

@Hans: too many comments in between that I was unsure what you answer to. Also I'm surprised that you say this while so many other say the opposite. Just look at the question I link to. – Magnetochemistry 9/8, 2011 at 11:57

@murrekatt: I suppose the negative feedback is because it's just bad practice. You might get away with it in this case, but what if you want to add an int or a pointer? Suddenly you're in trouble. If you stick to atomics as a matter of course, you'll be in the right concurrent mindset from the start. – Nullipore 9/8, 2011 at 12:3

There's never a lack of FUD when it comes to volatile. Best mentioned in comments, not answers. The question you linked requires additional synchronization to ensure that the thread has exited. – Pteridology 9/8, 2011 at 12:4

See my response where I explain what volatile does and with examples for exactly this situation: – Kick 9/8, 2011 at 12:13

@Hans Passant I may work in this specific case, or it may not. It's not guaranteed with most compilers and most hardware. I've not seen any that explicitly guarantee it, but if you know what the compiler does with volatile, and you know what the hardware does with what the compiler does, you might be able to derive a guarantee. On the other hand, there are a lot of systems where it most specifically won't work. (Sparc with either Sun CC or g++, for example. And I'm not too sure about Intel/AMD with VC++ or g++.) – Spaak 9/8, 2011 at 12:17

@JamesKanze: All real-world C++ implementations have inter-thread visibility for volatile in practice, because accesses are required to compile to a load or store in the asm (which is sufficient because CPUs have coherent caches between the cores that std::thread starts threads across). You don't get any ordering, but this code doesn't depend on that (and wouldn't give any useful sync with SC atomics). The well-defined way to do this would be std::atomic<bool> with memory_order_relaxed. volatile is obsolete for this but does work fine. – Loving 21/4, 2023 at 11:37

@JamesKanze: I assume you don't remember your point about how Sun CC or g++ would compile this for SPARC, but I'm certain g++ at least would compile it to the same memory operations as std::atomic<bool> with memory_order_relaxed. (Just a load or store, no barriers). It could only break if a compiler disregarded volatile and hoisted a load out of the loop, not actually re-checking it every iteration. The reader will definitely see a value stored by another core after a few tens of nanoseconds, thanks to cache coherency. – Loving 21/4, 2023 at 11:42

@PeterCordes Without barriers, the hardware may simply reuse the value it has just read in the previous loop (at least on a Sparc, but I think on an Intel as well). This has nothing to do with the cache, but rather the read pipeline. (It's true that this is highly unlikely if the loop is doing anything significant. The read pipeline isn't very big.) – Spaak 21/10, 2023 at 17:28

@JamesKanze: That's true only if cache is still valid. If another core does a store, it will invalidate other copies of the cache line before committing the store to L1d cache, maintaining cache coherency and preventing the problem you're describing. That's why atomic<T> load or store with memory_order_relaxed can compile without any extra barrier instructions. (The same as volatile.) Out-of-order exec might run loads early relative to other code, but the OoO exec window is small so this won't stop a core from noticing an exit_now variable becoming true for a meaningful time. – Loving 21/10, 2023 at 17:53

@JamesKanze: Since there is no meaningful notion of simultaneity across threads (unless you mean relative order of different operations, in which case you could use seq_cst or release/acquire if you want that), there's no reason to make a thread wait for anything before running a load instruction for a relaxed load. It does sample from cache at a nearby time to when it appears in program order, which is fine. – Loving 21/10, 2023 at 17:57

@PeterCordes Whether it actually samples from the cache or not depends on the hardware. If the hardware finds the value in the read pipeline, it may not even go to the cache. (And of course, as far as the standard is concerned, it's undefined behavior.) – Spaak 23/10, 2023 at 11:31

@JamesKanze: I've never heard of a "read pipeline" as an extra thing loads could snoop and take a value from instead of cache. Loads do have to snoop the store buffer, but will only find a value there if this thread has recently stored to that location. And cache-miss loads will snoop the buffers for incoming cache lines so they can attach themselves to wait for a line that's already been requested, avoiding a duplicate req. A google hit for SPARC "read pipeline" found oracle.com/technetwork/systems/opensparc/… but it's not that. – Loving 23/10, 2023 at 11:53

@JamesKanze: I don't think it's plausible that loads in a spin-loop could keep seeing a stale value indefinitely on SPARC. godbolt.org/z/oaezdxaGj shows SPARC GCC compiling std::atomic<long> .load(relaxed) into just a load instruction with no barriers or anything, same as you'd get from volatile, so GCC thinks a pure load is sufficient to give the "reasonable time" visibility guarantee ISO C++ says implementations "should" provide. – Loving 23/10, 2023 at 12:2

@PeterCordes For what definition of "reasonable time"? The standard (nor any of the compiler specifications that I know of) don't have this concept. All the standard says for relaxed order concerns ordering, not time. And it says that relaxed has no guarantees with respect to order, only that the accesses must be "indivisible". Accessing a properly aligned integral type on a Sparc/Intel/whatever will be "indivisible", provided the integral type isn't larger than a machine word (64 bits on modern processors). There's nothing in the standard to say when this indivisible access will occur. – Spaak 24/10, 2023 at 15:15

@PeterCordes Wrt the read queue: on a modern processor, a read is executed asynchronously; when the load instruction is executed, the processor schedules a read operation, and marks the target as "dirty" (or remaps the register, it depends on the processor). The read takes place in the background, over a number of clocks, using a pipeline to output the address, then recover the results. The actual read operation is always for a complete word, and when the next load occurs, the processor will look to see if there is already a request to this address... – Spaak 24/10, 2023 at 15:21

... and use it. The main goal here is optimizing successive accesses when reading a succession of bytes, so that for example a memcpy with the pointers and counter in registers will actually only read and write words (although it will still do the loop on the number of bytes). A side effect is that if you're constantly reading a value from a single word, and not reading anything else in the meantime, the processor, having read the word, may simply acquire the value from the word already read... – Spaak 24/10, 2023 at 15:26

In actual practice, of course, if the system is doing anything else, at some point or another, the OS will suspend the execution of your process, and when your process restarts, it will reread the memory. On a multi-core processor which is only doing normal background processing, however, this may take seconds, minutes or even hours. (Remember that a top of the line processor may have several hundred cores, and your process will only be suspended if there are no more cores left.) – Spaak 24/10, 2023 at 15:29

@JamesKanze: Can you cite any sources to back up your claim that there's effectively some non-coherent level-0 cache as part of something called a "read pipeline" in SPARC CPUs? That's not quite how most CPUs work; register renaming and pipelining loads sounds normal, and handling byte or half-word loads by loading the containing word also sounds normal, but having later loads snoop earlier loads instead of just accessing cache again sounds weird. – Loving 24/10, 2023 at 22:38

@JamesKanze: See Can the compiler optimize out accesses with memory order relaxed that are not ordered by any memory fence? - the ISO C++ standard says that stores should be visible to loads "in a reasonable amount of time" [atomics.order] and in "finite period of time" [intro.progress]. Real implementations satisfy that by not optimizing atomics, so inter-thread latency is just up to the hardware. – Loving 24/10, 2023 at 22:42

@JamesKanze Most people would not consider it "reasonable" for a relaxed load to not see a store for up to 10 ms, until the next timer interrupt on Linux with HZ=100. Or even 1 ms. So hardware doesn't work the way you say; loads can only hit earlier in-flight loads if they're both waiting for the same cache-line that's not already present in L1d cache. At least not most hardware; if any does, that would be an exceptional claim that needs strong evidence to back it up. – Loving 24/10, 2023 at 22:51

volatile can be used for such purposes. However this is an extension to standard C++ by Microsoft:

Microsoft Specific

Objects declared as volatile are (...)

A write to a volatile object (volatile write) has Release semantics; (...)

A read of a volatile object (volatile read) has Acquire semantics; (...)

This allows volatile objects to be used for memory locks and releases in multithreaded applications.^{(emph. added)}

That is, as far as I understand, when you use the Visual C++ compiler, a volatile bool is for most practical purposes an atomic<bool>.

It should be noted that newer VS versions add a /volatile switch that controls this behavior, so this only holds if /volatile:ms is active.

Intend answered 9/8, 2011 at 11:32 Comment(8)

Seems like this extension makes the C++ volatile keyword behave more similar to Java's volatile. In Java, volatile does guarantee order of access, which might explain some programmer's confusion about its function in C++. – Chill 9/8, 2011 at 11:43

That MSDN article is very unfortunate, it is dead wrong. You can't implement a lock with volatile, not even with Microsoft's version. The description is pretty irrelevant too, odds you'll run your code on an Itanium are slim these days. – Pteridology 9/8, 2011 at 11:46

@HansPassant: I have started a separate question to clear this up: stackoverflow.com/questions/7007403/… – Intend 10/8, 2011 at 7:43

@Hans wrote "You can't implement a lock with volatile, not even with Microsoft's version." - this is true. But there's no lock in the use case of this question. – Intend 10/8, 2011 at 8:10

It says: "a reference to a global or static object that occurs before a write to a volatile object in the instruction sequence will occur before that volatile write in the compiled binary" Doesn't sound like Java volatile. – Couture 23/10, 2011 at 20:42

/volatile:ms is a bit on the deprecated side, for example doesn't hold for ARM targets. Certainly not compliant. – Elsy 8/7, 2014 at 3:30

You don't need MS's acq_rel semantics for volatile for a "stop now" / "keep running" flag; atomic<bool> with std::memory_order_relaxed would be sufficient, and in practice volatile gives you something similar to that in practice on real implementations (like GCC/clang as well as MSVC /volatile:iso). See When to use volatile with multi threading? for an explanation of why: real C++ implementations run threads on CPUs that have coherent caches. volatile was the de-facto standard before C++11, and still works in practice. (Don't use it, though!) – Loving 18/4, 2023 at 3:55

when you use the Visual C++ compiler, a volatile bool is for most practical purposes an atomic<bool> - the differences are that atomic<bool> defaults to seq_cst, vs. MS bool giving acq_rel (and /volatile:iso being relaxed). So atomic<bool> is slower if you don't explicitly use flag.store(false, std::memory_order_release). Also atomic<bool> makes operations like flag ^= 1 into an atomic RMW, if that operation is supported for bool. – Loving 18/4, 2023 at 3:58

You don't need a synchronized variable, but rather an atomic variable. Luckily, you can just use std::atomic<bool>.

The key issue is that if more than one thread accesses the same memory simultaneously, then unless the access is atomic, your entire program ceases to be in a well-defined state. Perhaps you're lucky with a bool, which is possibly getting updated atomically in any case, but the only way to be offensively certain that you're doing it right is to use atomic variables.

"Seeing codebases you work in" is probably not a very good measure when it comes to learning concurrent programming. Concurrent programming is fiendishly difficult and very few people understand it fully, and I'm willing to bet that the vast majority of homebrew code (i.e. not using dedicated concurrent libraries throughout) is incorrect in some way. The problem is that those errors may be extremely hard to observe or reproduce, so you might never know.

Edit: You aren't saying in your question how the bool is getting updated, so I am assuming the worst. If you wrap your entire update operation in a global lock, for instance, then of course there's no concurrent memory access.

Nullipore answered 9/8, 2011 at 11:27 Comment(12)

AFAIK you don't need atomic for this specific code/example. let me know if I'm wrong. if so, on which platform does this fail? – Foetus 9/8, 2011 at 11:31

I do not know if I misunderstood you but in my opinion std::atomic synchronizes(by mutual exclusion) the value it holds. – Lexeme 9/8, 2011 at 11:31

Thanks for your answer. I just mentioned that I see this in code bases I work in, but I also see this online in articles and forums to be the suggested way to do things. This is what I find strange and would like to get an explanation to. – Magnetochemistry 9/8, 2011 at 11:32

Concurrent programming isn't intrinsically very difficult. It's very difficult when using traditional exclusion and signalling primitives to coordinate access to shared memory. Using message queues to pass around immutable objects is much safer and easier to reason about than the traditional models. It's still not easy, but the difference is like night and day. – Fermata 9/8, 2011 at 11:36

@Marcelo: your point makes sense if two threads try to write to the same memory address. here is just one updater and one reader and the reader probably doesn't care if it doesn't see the update instantly. – Foetus 9/8, 2011 at 11:49

@Nobody: atomics are a bit different -- their access is synchronized at the instruction level. There's no visible mutex at the language level. If you use mutex locks, then there is no concurrent access, because the mutex serializes the access. But, if you think about how to implement a mutex, you will see that you actually need atomics! – Nullipore 9/8, 2011 at 11:49

@yi_H: Yes, but I'm guessing that it does care not to miss a sequence of updates. Assuming you only want at most one thread to run at any given time, the problem isn't that simple. You can solve this using conventional primitives, but a queue is versatile enough to solve this and practically any other kind of concurrency problem. – Fermata 9/8, 2011 at 11:57

@Marcello: Fair enough, but you still have two big problems: a) write such a queue. Your library might do that for you. And b) this is fairly restrictive and you need to incorporate the message queue deep into your design, and make sure you're not cutting corners. I'm sure it can be done, but I imagine that you still have to be very alert, and nobody will tell you when you're doing something wrong. – Nullipore 9/8, 2011 at 12:0

@Kerrek: I know about atomics, but in my opinion the usual architectures only provide atomic operations on bitlevel. The std::atomic is a container for anything and if you look in the interface you will see bool is_lock_free() that in my opinion shows that internally this container wraps a mutex around the internal structure to make the operations atomic. – Lexeme 9/8, 2011 at 12:0

@Kerrek: I say in the question that another thread will call stop when it wants (no other locking). – Magnetochemistry 9/8, 2011 at 12:1

@Nobody: Most common architectures guarantee atomicity for aligned word sized loads and stores; some, such as x86 provide additional atomic primitives for incrementing etc. std::atomic is designed to allow an implementation to use the hw provided atomics for some types (e.g. std::atomic<int>) and a wrapper with a mutex for more complicated types. – Venom 9/8, 2011 at 12:20

@janneb: Thanks for pointing that out. So I might say that std::atomic<bool> can be really atomic while std::atomic<std::list> for example will be mutex wrapped. – Lexeme 9/8, 2011 at 12:24

volatile can be used for such purposes. However this is an extension to standard C++ by Microsoft:

Microsoft Specific

Objects declared as volatile are (...)

A write to a volatile object (volatile write) has Release semantics; (...)

A read of a volatile object (volatile read) has Acquire semantics; (...)

This allows volatile objects to be used for memory locks and releases in multithreaded applications.^{(emph. added)}

That is, as far as I understand, when you use the Visual C++ compiler, a volatile bool is for most practical purposes an atomic<bool>.

It should be noted that newer VS versions add a /volatile switch that controls this behavior, so this only holds if /volatile:ms is active.

Intend answered 9/8, 2011 at 11:32 Comment(8)

@HansPassant: I have started a separate question to clear this up: stackoverflow.com/questions/7007403/… – Intend 10/8, 2011 at 7:43

@Hans wrote "You can't implement a lock with volatile, not even with Microsoft's version." - this is true. But there's no lock in the use case of this question. – Intend 10/8, 2011 at 8:10

/volatile:ms is a bit on the deprecated side, for example doesn't hold for ARM targets. Certainly not compliant. – Elsy 8/7, 2014 at 3:30

Using volatile is enough only on single cores, where all threads use the same cache. On multi-cores, if stop() is called on one core and run() is executing on another, it might take some time for the CPU caches to synchronize, which means two cores might see two different views of isRunning_. This means run() will run for a while after it has been stopped.

If you use synchronization mechanisms, they will ensure all caches get the same values, in the price of stalling the program for a while. Whether performance or correctness is more important to you depends on your actual needs.

Dowling answered 9/8, 2011 at 11:31 Comment(12)

well, in that loop you tipically do a lot of things otherwise it will just eat your CPU... so cache syncrhonization shouldn't be an issue. Also for an example like this you don't call stop() and run() simultaneously. – Foetus 9/8, 2011 at 11:35

This... It's suprising to many programmers that changes they make to a memory location in one thread may not be visible to another thread reading the same memory location, and that changes to several memory locations may be seen out of order. There are special memory barrier instructions to synchronise things. But it's very much easier to use c++ atomic types, or synchronization functions from a library, or things like InterlockedIncrement (on windows) rather than try to get it right yourself – Coastguardsman 9/8, 2011 at 11:36

Thanks eran, yes, this is how I concluded it after reading a bit answers to "volatile" questions here on SO. You mention performance, isn't it slower to use volatile or maybe std::atomic or alike is as slow? – Magnetochemistry 9/8, 2011 at 11:36

@yi_H, multithreading is so hard because of all those edge cases... If the stop() is called by a button event, you're right. Human actions aren't fast enough anyway. But for the general case, cache synchronization is at least an issue to have in mind. – Dowling 9/8, 2011 at 11:40

@eran: I don't see the point. if run() does CPU intensive processing or blocking I/O in that loop then doing stop(); .. start(); doesn't make sense. not even with atomic. Your thread might never see that transition. you need another variable to signal that the thread recieved the stop() so you can call the next start(). (yes I know, there are MT primitives to do this, but again, do you know a platform where this doesn't work?) – Foetus 9/8, 2011 at 11:43

@murrekatt, volatile might slightly hit performance, otherwise the compiler would never have bothered to use a register. But that's call premature optimization. On most cases, you'll never feel the hit. Real synchronizations will cause a more noticeable hit, but how much depends on the mechanism used, the architecture and the rest of your code. You'll just have to test it. – Dowling 9/8, 2011 at 11:46

@yi_H, I wan't referring to the stop and run simultaneous call scenario... In my answer, I refer to the case where run is executing the loop on one core for some time, and then stop is called on another core. In this case, it might take some time for the run thread to see the change. Solving more complicated cases like multiple starts and stops require more complicated mechanisms than the given code, no question about that. – Dowling 9/8, 2011 at 11:57

Cache synchronization isn't the issue (or at least not the only one). The read and write queues on the processor can be a more fundamental issue; for example, if the processor finds the value it wants in the read queue, it generally won't attempt to go to memory again. For code like the above to work, you need to ensure that the memory is synchronized, using some sort of barrier or fence machine instruction. Most compilers don't generate this for volatile, so volatile doesn't suffice here. – Spaak 9/8, 2011 at 12:21

(note: waiting a little bit for cache synchronization is fine) – Foetus 9/8, 2011 at 12:47

@yi_H Sun CC or g++ on a Sparc. I'm not sure about Intel systems; I've seen contradictory statements of what the hardware guarantees. – Spaak 9/8, 2011 at 14:5

@eran I have posted a related question to this proposed answer at https://mcmap.net/q/25015/-memory-barriers-force-cache-coherency/2369597 and I'd be really grateful if you could spare a few moments to post on it. Thank you. – Mauritamauritania 21/6, 2015 at 12:46

@JamesKanze: If a variable is used solely for cancellation, which will be rare, and if cancellation didn't need to occur with any particular timeliness, could one avoid synchronization overhead in the non-cancellation case by having code that simply ensuring that every thread will get hit with a context switch at least occasionally? – Surfacetosurface 16/7, 2015 at 18:7

There are three major problems you are facing when multithreading:

1) Synchronization and thread safety. Variables that are shared between several threads must be protected from being written to by several threads at once, and prevented from being read during non-atomic writes. Synchronization of objects can only be done through a special semaphore/mutex object which is guaranteed to be atomic by itself. The volatile keyword does not help.

2) Instruction piping. A CPU can change the order in which some instructions are executed to make code run faster. In a multi-CPU environment where one thread is executed per CPU, the CPUs pipe instructions without knowing that another CPU in the system is doing the same. Protection against instruction piping is called memory barriers. It is all explained well at Wikipedia. Memory barriers may be implemented either through dedicated memory barrier objects or through the semaphore/mutex object in the system. A compiler could possibly chose to invoke a memory barrier in the code when the volatile keyword is used, but that would be rather special exception and not the norm. I would never assume that the volatile keyword did this without having it verified in the compiler manual.

3) Compiler unawareness of callback functions. Just as for hardware interrupts, some compilers may not know that an callback function has been executed and updated a value in the middle of code execution. You can have code like this:

// main
x=true;
while(something) 
{   
  if(x==true)   
  {
    do_something();
  }
  else
  {
    do_seomthing_else();
    /* The code may never go here: the compiler doesn't realize that x 
       was changed by the callback. Or worse, the compiler's optimizer 
       could decide to entirely remove this section from the program, as
       it thinks that x could never be false when the program comes here. */
  } 
}

// thread callback function:
void thread (void)
{
  x=false;
}

Note that this problem only appears on some compilers, depending on their optimizer settings. This particular problem is solved by the volatile keyword.

So the answer to the question is: in a multi-threaded program, the volatile keyword does not help with thread synchronization/safety, it does likely not act as a memory barrier, but it could prevent against dangerous assumptions by the compiler's optimizer.

Ravine answered 9/8, 2011 at 12:35 Comment(7)

Thanks Lundin for your answer. I still don't hear a consensus if volatile bool is appropriate code in my example. – Magnetochemistry 9/8, 2011 at 12:38

@Magnetochemistry I believe you have gotten it right when you wrote the question, the volatile in that case is just to protect again optimizer goof-ups, just as in 3) in my answer above. The intent in your specific example does not seem to be thread synchronization, which volatile wouldn't help with. – Ravine 9/8, 2011 at 13:11

Still there are claims that this might not work. I'm waiting for some kind of consensus what actually works and is appropriate in a case like this. At the moment I don't have much more clarity than before I asked the question. :( – Magnetochemistry 9/8, 2011 at 13:18

@Magnetochemistry I don't think anyone is saying that volatile can be used as a mutex/semaphore, which was what you asked. As have been pointed out by several, it can be used as a memory barrier in some cases, though such code is compiler dependant and non-portable. Someone said it can't be used as an event, but that's just a remark about CPU usage efficiency: it won't cause bugs just bad performance. Regarding the question you linked to, and you answer, you got some unfair response from that... you never did say it should be volatile for sync purposes. It should be volatile for my reason 3) above. – Ravine 9/8, 2011 at 19:4

@Couture Historically, all of them, when optimizations are on. Nowadays PC compilers in particular do an ok job of realizing that callbacks are called by someone else other than themselves - embedded systems compilers less so. It's all about the compiler's ability to treat callbacks/interrupts as a special case. – Ravine 31/1, 2020 at 7:24

@Ravine Can you name of a few of these broken compilers? – Couture 31/1, 2020 at 16:50

How does any of this apply for volatile bool x? Part of the point of volatile is that the value you read might not be the same as a value this thread stored earlier. If a compiler does constant-propagation through a volatile, it's severely broken for pre-C++11 hand-rolled atomics. Who's afraid of a big bad optimizing compiler? on LWN describes the Linux kernel's use of volatile to avoid problems like that as it rolls its own atomics. (On GCC and clang, not supporting other compilers, to be fair.) – Loving 18/4, 2023 at 4:2

This will work for your case but to protect a critical section this approach is wrong. If it were right then one could use a volatile bool in almost all cases where a mutex is used. The reason for it is that a volatile variable does not guarantee enforcing any memory barriers nor any cache coherence mechanism. On the contrary, a mutex does. In other words once a mutex is locked a cache invalidation is broadcast to all cores in order to maintain consistency among all cores while. With volatile this is not the case. Nevertheless, Andrei Alexandrescu proposed a very interesting approach to use volatile to enforce synchronization on a shared object. And as you'll see he does it with a mutex; volatile is only used to prevent accessing the object's interface without synchronization.

Milks answered 6/2, 2013 at 21:57 Comment(0)

I think there is nothing wrong with this code and it works fine. However, as you said, this way of writing is no longer recommended, because the efficiency is not high and the maintainability is not high. If you can already use ATOMIC, just give up this way of writing reason:

The one-to-one refresh of the CPU cache will not be immediately synchronized to the multicore.
When you feel that this code is running normally, you will get carried away, add other field variables, and the CPU can execute out of order, so it is possible that your newly added other variables are still a null pointer, when your volatile BOOL value is true.

Lisabeth answered 18/4, 2023 at 3:40 Comment(2)

std::atomic<bool> flag with flag.store(true, std::memory_order_relaxed) will compile to the same asm as volatile bool flag with flag = true;, across all ISAs. With a "stop now / keep running" flag, there's usually no point in having the writer thread stall later memory operations until the store is globally visible, which is what you get from the default seq_cst memory order. (CPU cache will never be "immediately" refreshed, but seq_cst doesn't make it any faster, it just makes this CPU wait.) – Loving 18/4, 2023 at 4:6

I wouldn't recommend volatile bool for this, but it's actually fine on all mainstream compilers for all ISAs as a kind of equivalent to relaxed atomics, because all real C++ implementations run std::thread across cores with cache-coherent shared memory. – Loving 18/4, 2023 at 4:8

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Microsoft Specific

Microsoft Specific

Recommended topics

Hot tags