Why is volatile not considered useful in multithreaded C or C++ programming?
Asked Answered
P

9

191

As demonstrated in this answer I recently posted, I seem to be confused about the utility (or lack thereof) of volatile in multi-threaded programming contexts.

My understanding is this: any time a variable may be changed outside the flow of control of a piece of code accessing it, that variable should be declared to be volatile. Signal handlers, I/O registers, and variables modified by another thread all constitute such situations.

So, if you have a global int foo, and foo is read by one thread and set atomically by another thread (probably using an appropriate machine instruction), the reading thread sees this situation in the same way it sees a variable tweaked by a signal handler or modified by an external hardware condition and thus foo should be declared volatile (or, for multithreaded situations, accessed with memory-fenced load, which is probably a better a solution).

How and where am I wrong?

Peterman answered 20/3, 2010 at 22:10 Comment(7)
All volatile does is say that the compiler should not cache the access to a volatile variable. It says nothing about serialising such access. This has been discussed here I don't know how many times, and I don't think this question is going to add anything to those discussions.Lowrie
@neil I searched for other questions, and found one, but any existing explanation I saw somehow didn't trigger what I needed to really understand why I was wrong. This question has elicited such an answer.Peterman
For a great in-depth study on what CPUs do with data (via their caches) check out: rdrop.com/users/paulmck/scalability/paper/whymb.2010.06.07c.pdfAlcatraz
In Java volatile creates a memory barrier when it's read, so it can be used as a threadsafe flag that a method has ended since it enforces a happens-before relationship with the code before the flag was set. This is not the case in C.Sammer
@Sammer Java volatile has nothing to do with C/C++ volatile: Java volatile is defined inside the language model, can be transformed, optimized (write followed by read of a Java volatile can be optimized to write) and volatile completely eliminated if the volatile variable is provably not shared between threads. In C/C++ volatile is outside the language model, the operations on volatile objects can have effects visible to other devices and by definition no transformation on volatile is possible, not even a read whose result is ignored can be eliminated. They don't have the same purpose.Gleam
@Gleam That's what I meant with "not the case in C", where it can be used to write to hardware registers etc., and isn't used for multithreading like it's commonly used in Java.Sammer
C++11 introduced std::atomic<T> which obsoletes volatile for multithreading: we no longer have to hand-roll atomics. (But it does still work on implementations where it worked before, see When to use volatile with multi threading? for why it's like atomic with memory_order_relaxed on GCC/clang. Also ([Who's afraid of a big bad optimizing compiler?](lwn.net/Articles/793253 ) on LWN) re: compiler optimizations that break code which doesn't use either volatile or atomic, even if it uses a memory barrier to avoid the obvious problems.)Irishirishism
I
246

The problem with volatile in a multithreaded context is that it doesn't provide all the guarantees we need. It does have a few properties we need, but not all of them, so we can't rely on volatile alone.

However, the primitives we'd have to use for the remaining properties also provide the ones that volatile does, so it is effectively unnecessary.

For thread-safe accesses to shared data, we need a guarantee that:

  • the read/write actually happens (that the compiler won't just store the value in a register instead and defer updating main memory until much later)
  • that no reordering takes place. Assume that we use a volatile variable as a flag to indicate whether or not some data is ready to be read. In our code, we simply set the flag after preparing the data, so all looks fine. But what if the instructions are reordered so the flag is set first?

volatile does guarantee the first point. It also guarantees that no reordering occurs between different volatile reads/writes. All volatile memory accesses will occur in the order in which they're specified. That is all we need for what volatile is intended for: manipulating I/O registers or memory-mapped hardware, but it doesn't help us in multithreaded code where the volatile object is often only used to synchronize access to non-volatile data. Those accesses can still be reordered relative to the volatile ones.

The solution to preventing reordering is to use a memory barrier, which indicates both to the compiler and the CPU that no memory access may be reordered across this point. Placing such barriers around our volatile variable access ensures that even non-volatile accesses won't be reordered across the volatile one, allowing us to write thread-safe code.

However, memory barriers also ensure that all pending reads/writes are executed when the barrier is reached, so it effectively gives us everything we need by itself, making volatile unnecessary. We can just remove the volatile qualifier entirely.

Since C++11, atomic variables (std::atomic<T>) give us all of the relevant guarantees.

Intermolecular answered 20/3, 2010 at 23:17 Comment(41)
Thanks. The piece I was missing in my understanding, as near as I can tell, was that volatile does not guarantee no reordering with neighboring non-volatile memory accesses, which is required for the atomic flag-setting use case.Peterman
Does it really guarantee order and forbid the use of cached values or only does so at the compiler level? If the former was true, you could possibly write portable thread synchronization, but all the code I have seen uses CPU-specific instructions and assumes the CPU will reorder everything.Assyria
@jbcreix: Which "it" are you asking about? Volatile or memory barriers? In any case, the answer is pretty much the same. They both have to work both at compiler and CPU level, since they describe the observable behavior of the program --- so they have to ensure that the CPU doesn't reorder everything, changing the behavior they guarantee. But you currently can't write portable thread synchronization, because memory barriers are not part of standard C++ (so they're not portable), and volatile isn't strong enough to be useful.Intermolecular
An MSDN example does this, and claims that instructions can't be reordered past a volatile access: msdn.microsoft.com/en-us/library/12a04hfd(v=vs.80).aspxTalmudist
@OJW: But Microsoft's compiler redefines volatile to be a full memory barrier (preventing reordering). That's not part of the standard, so you can't rely on this behavior in portable code.Intermolecular
So, how do memory barriers help when there's concurrent access from multiple CPUs?Shauna
@Shauna In short, CPU and compiler magic (and yes, you need both for it to work). A memory barrier is a CPU instruction which is designed to solve that problem. It interacts with the memory bus, ensuring that no other core can read/write memory until the barrier has finished doing its thing, and the CPU knows not to reorder memory accesses across it. It can't really be emulated in software.Intermolecular
@jalf: But you'd still need the volatile keyword to ensure the value as seen by the code is the value in memory. Without it, couldn't the optimiser cache the value in a register? The problem then becomes one of concurrent access.Shauna
@Skizz: no, that's where the "compiler magic" part of the equation comes in. A memory barrier has to be understood by both the CPU and the compiler. If the compiler understands the semantics of a memory barrier, it knows to avoid tricks like that (as well as reordering reads/writes across the barrier). And luckily, the compiler does understand the semantics of a memory barrier, so in the end, it all works out. :)Intermolecular
@jalf: So how do you specify when a memory barrier is required? The compiler has no way of knowing if the code is going to be run on a single core/thread or multiple cores/threads. The optimiser would registerise values wherever possible which would fail if two cores were accessing the data simultaneously, since each thread/core has it's own register stored version of the value. Without 'volatile' the compiler will certainly get it wrong.Shauna
@Skizz: I'm not sure I understand. You specify a memory barrier by using whichever intrinsic the compiler provides for the purpose (or by using the C++11 equivalent). It's something you have to insert manually; you can't rely on the compiler generating memory barriers where needed for you.Intermolecular
@jalf: I see, it wasn't clear in your answer that memory barriers are a language extension / C++11 feature. So if the compiler doesn't have that extension you're stuck with volatile. I thought there was something in C/C++ (pre-11) that I was unaware of.Shauna
@Skizz: I see. Sorry for the confusion then. However, if the compiler doesn't have such an extension, then you're out of luck anyway, because volatile is not sufficient in itself.Intermolecular
@Skizz: Threads themselves are always a platform-dependent extension prior to C++11 and C11. To my knowledge, every C and C++ environment that provides a threading extension also provides a "memory barrier" extension. Regardless, volatile is always useless for multi-threaded programming. (Except under Visual Studio, where volatile is the memory barrier extension.)Thickskinned
As you said, volatile is useful to access I/O registers. The cost is much higher if we use memory barrier to implement such function (block potential compiler/hardware optimization)Fries
@Thomson: can you elaborate on that? Why would the cost be higher than with volatile? volatile prevents optimizations too.Intermolecular
If we memory barrier to simulate/implement function of volatile, would it cause many other unrelated read/write to happen actually? It also blocks compiler to reorder access to all unrelated variables.Fries
@Fries I don't see why it would cause "many other unrelated read/writes". The point about reordering can cut both ways, depending on usage. A memory prevents all reads/writes from being reordered across it, but only at a specific point in time (When the barrier code is executed), whereas a volatile variable only prevents reordering with respect to other volatile reads/writes, but on the other hand, prevents this for all accesses to the variable, not just at a specific point in the code.Intermolecular
As per my understanding, volatile will force to read the value always from the memory instead of the cache. Suppose we we have shared variable (among threads), one thread modifies it and the variable is non-volatile. Assuming we are using the mutex locks whenever we access the shared variable. Since the shared variable is non-volatile, how is it guaranteed that variable will always be read from memory not from cache?Korwin
@SumitTrehan that's implicit in the mutex. It implies a memory barrier, so when it is executed, the compiler is instructed to ensure that all writes to variables that may be visible to other threads should be flushed to memory.Intermolecular
Volatile keyword is still required here to prevent complier to cache the flag instead of reading the value from memory.Fructidor
@guardian: No it isn't, data dependency analysis treats the memory barrier as an external function that could have changed any variable that has ever been aliased. (Register storage local variables whose address is never taken actually is perfectly safe). Even in single-threaded code, global_x = 5; extern_call(); cout << global_x; the compiler cannot replace it with cout << 5; because extern_call() may have changed the value.Hibbitts
@Thomson: In what cases would treating volatile as a global barrier to compiler reordering pose a major performance problem? If within a piece of code all places that need barriers have them explicitly marked, allowing a compiler to reorder operations across volatile might have some benefit, but one has code that uses volatile but isn't customized to use barriers understood by the compiler one is using, an option to treat volatile as a compiler-ordering barrier would allow that compiler to usefully run such code.Klink
You say volatile isn't required - which implies that a memory barrier would force reads from memory, and not a cached register value. Is that correct, that memory barriers force reads to not be cached? Or did you mean this for multi-threaded use of volatile, not register/IO use?Mousy
On modern CPUs (past ten years or so), volatile doesn't even force a read or write to memory. It just causes the CPU to behave as if a read or write was forced to memory for single-threaded code. And that is all that the standard required, since it had no specific multi-threaded semantic requirements.Vulgarity
@iheanyi: The only advantages of volatile over instrinsics to assume external forces may read and/or write any object at any point in time are (1) existing code that includes neither volatile nor barriers could be fixed more easily, in most cases, by adding volatile than by adding barriers; (2) some platforms may require special instructions for volatile reads/writes that are different from doing a sync, then the access, and then another sync. The only sane reason I can think of for C89 not to have included barriers (since any implementation should be able to achieve the required...Klink
...semantics, whether or not they would actually require the use of barriers to to so) is that the authors expected that implementations where programmers might need barriers for various purposes would make "volatile" provide them). Even I/O registers often don't need semantics as precise as what volatile would provide. In many cases, it will be important that one group of operations precedes another, but the order of operations within each group won't matter.Klink
@DavidSchwartz "And that is all that the standard required, since it had no specific multi-threaded semantic requirements" What has "specific multi-threaded semantic requirements"? Do normal variables have that? Where is that specified? Are any MT program well defined?Gleam
@Intermolecular "all writes to variables that may be visible to other threads should be flushed to memory" Flushed how? To where?Gleam
@Thickskinned "Regardless, volatile is always useless for multi-threaded programming" Why can't you use volatile for a simple and fast flag?Gleam
@Gleam As this answer itself says (did you read it?), volatile provides no guarantee about ordering of memory operations. Strictly speaking, it doesn't even provide a guarantee of atomicity. For either one, you either need C++11 atomics or some implementation-dependent mechanism... And with either of those, you do not need volatile.Thickskinned
@Thickskinned "Strictly speaking, it doesn't even provide a guarantee of atomicity" Of course the qualification in itself can't make a usually non atomic operation now atomic: a volatile qualified big struct in C will not be read or written by one asm instr, as no such instr even exists. But combining volatile and a type known to be have atomic loads and stores on all arch (like int) might be useful for MT. Volatile guarantees that the variable is represented according to the ABI and we know which operations are atomic, by arch specific knowledge. I should have spelled that out.Gleam
"volatile provides no guarantee about ordering of memory operations" So why can't we use volatile for simple flags that have no special ordering requirements, but that needs to be accessed as efficiently as possible?Gleam
@Gleam For standards that don't mention multi-threading, nothing has any specific multi-threading semantics. To find the semantics for multi-threading, you have to look at the multi-threading standard you are coding to. If it says volatile has specific multi-threading semantics, then it does. If not, it doesn't and you would be foolish to rely on it. Decent multi-threading standards explain what thread safety things like normal variables have. It would be terrible if they forced coders to guess, wouldn't it?Vulgarity
@DavidSchwartz "Tyou have to look at the multi-threading standard you are coding to" We are not programming in a vacuum. The std says that all volatile operations are part of the observable trace. We know the target machine and we know f.ex. that a write of word size scalar is atomic on the target machine. So we know that a volatile write is atomic because it's a low level operation. We can reason at the asm level because volatile gives us a connection from the abstract to the real machine. Using volatile for MT is silly when we have proper primitives that get the job done though.Gleam
@Gleam I agree that you can do platform-specific reasoning and validate it. It's stupid to do it, since you don't have to and take a high risk that you'll get it wrong, but you can do it. On some implementations/platforms, you may have no choice. Fortunately, modern platforms and modern threading standards provide clear, guaranteed semantics and such guessing and risk taking is a thing of the past. (Plus with volatile, you may incur unnecessary overhead because you have to get and pay for the standard's specified single-threading semantics too.)Vulgarity
@DavidSchwartz I disagree strongly. 1) The single thread semantic of volatile implies so little and a volatile write has the cost of a non volatile write to a normal (non register) variable that can't be optimized (such as many operations on non automatic variables that can be aliased and that aren't often optimized), and 2) atomics are almost never optimized (even the most obviously redundant operation) in practice anyway. 3) The specification of atomics and the memory is absolutely not "clear" and probably is strictly meaningless and logical garbage.Gleam
4) Many compilers that tried to implement the consume order properly got special cases wrong, with incorrect code generation that might lead to hard to diagnose bugs. 5) Consume specification is crazy anyway as you can consume by calling a static member function on an object. 6) Even the simplest Q in SO re: MT in C/C++ lead to contradicting answers. 7) It isn't clear why an atomic int is lead strongly specified than a non atomic int. 8) Nobody knows exactly how to do program proofs on MT programs.Gleam
This answer is arguing that "volatile doesn't do everything you need, therefore it's useless". The flaw in that argument is that volatile can be used together with the x86 memory ordering model, to create similar ordering properties for C programs, which is useful in applications like message queues that require eventuality properties.Lanlana
One more thing guaranteed by atomic over volatile is reading or writing the data atomically at a single step and never see a half-changed data.Fosterling
@Lanlana At best, that's wasting time reinventing now-standard parts of the language.Bermejo
S
62

You might also consider this from the Linux Kernel Documentation.

C programmers have often taken volatile to mean that the variable could be changed outside of the current thread of execution; as a result, they are sometimes tempted to use it in kernel code when shared data structures are being used. In other words, they have been known to treat volatile types as a sort of easy atomic variable, which they are not. The use of volatile in kernel code is almost never correct; this document describes why.

The key point to understand with regard to volatile is that its purpose is to suppress optimization, which is almost never what one really wants to do. In the kernel, one must protect shared data structures against unwanted concurrent access, which is very much a different task. The process of protecting against unwanted concurrency will also avoid almost all optimization-related problems in a more efficient way.

Like volatile, the kernel primitives which make concurrent access to data safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent unwanted optimization. If they are being used properly, there will be no need to use volatile as well. If volatile is still necessary, there is almost certainly a bug in the code somewhere. In properly-written kernel code, volatile can only serve to slow things down.

Consider a typical block of kernel code:

spin_lock(&the_lock);
do_something_on(&shared_data);
do_something_else_with(&shared_data);
spin_unlock(&the_lock);

If all the code follows the locking rules, the value of shared_data cannot change unexpectedly while the_lock is held. Any other code which might want to play with that data will be waiting on the lock. The spinlock primitives act as memory barriers - they are explicitly written to do so - meaning that data accesses will not be optimized across them. So the compiler might think it knows what will be in shared_data, but the spin_lock() call, since it acts as a memory barrier, will force it to forget anything it knows. There will be no optimization problems with accesses to that data.

If shared_data were declared volatile, the locking would still be necessary. But the compiler would also be prevented from optimizing access to shared_data within the critical section, when we know that nobody else can be working with it. While the lock is held, shared_data is not volatile. When dealing with shared data, proper locking makes volatile unnecessary - and potentially harmful.

The volatile storage class was originally meant for memory-mapped I/O registers. Within the kernel, register accesses, too, should be protected by locks, but one also does not want the compiler "optimizing" register accesses within a critical section. But, within the kernel, I/O memory accesses are always done through accessor functions; accessing I/O memory directly through pointers is frowned upon and does not work on all architectures. Those accessors are written to prevent unwanted optimization, so, once again, volatile is unnecessary.

Another situation where one might be tempted to use volatile is when the processor is busy-waiting on the value of a variable. The right way to perform a busy wait is:

while (my_variable != what_i_want)
    cpu_relax();

The cpu_relax() call can lower CPU power consumption or yield to a hyperthreaded twin processor; it also happens to serve as a memory barrier, so, once again, volatile is unnecessary. Of course, busy-waiting is generally an anti-social act to begin with.

There are still a few rare situations where volatile makes sense in the kernel:

  • The above-mentioned accessor functions might use volatile on architectures where direct I/O memory access does work. Essentially, each accessor call becomes a little critical section on its own and ensures that the access happens as expected by the programmer.

  • Inline assembly code which changes memory, but which has no other visible side effects, risks being deleted by GCC. Adding the volatile keyword to asm statements will prevent this removal.

  • The jiffies variable is special in that it can have a different value every time it is referenced, but it can be read without any special locking. So jiffies can be volatile, but the addition of other variables of this type is strongly frowned upon. Jiffies is considered to be a "stupid legacy" issue (Linus's words) in this regard; fixing it would be more trouble than it is worth.

  • Pointers to data structures in coherent memory which might be modified by I/O devices can, sometimes, legitimately be volatile. A ring buffer used by a network adapter, where that adapter changes pointers to indicate which descriptors have been processed, is an example of this type of situation.

For most code, none of the above justifications for volatile apply. As a result, the use of volatile is likely to be seen as a bug and will bring additional scrutiny to the code. Developers who are tempted to use volatile should take a step back and think about what they are truly trying to accomplish.

Sikorski answered 21/3, 2010 at 2:59 Comment(15)
"Adding the volatile keyword to asm statements will prevent this removal." Really?Gleam
@curiousguy: Yes. See also gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/Extended-Asm.html .Trainbearer
The spin_lock() looks like a regular function call. What is special about it that the compiler will treat it specially so that the generated code will "forget" any value of shared_data that has been read before the spin_lock() and stored in a register so that the value has to be read anew in the do_something_on() after the spin_lock()?Ethylethylate
@Syncopated I'm pretty sure any introduction to locks/mutexes/etc would explain this to you. But suffice it to say the key words are there in the text: memory barrier. Basically, the functions need to contain some platform-specific opcode or other trigger, which the compiler knows to mean 'you absolutely cannot move any operation between this and the next memory barrier outside the region delimited by the two'. Then once within the barriers, the mutex ensures no other thread can access the data, and operations on shared_data occur in their stated order as the Standard defines they have to.Featherstitch
@Featherstitch My point is that I can't tell from the function name spin_lock() that it does something special. I don't know what's in it. Particularly, I don't know what's in the implementation that prevents the compiler from optimizing away subsequent reads.Ethylethylate
Syncopated has a good point. This essentially means that programmer should know the internal implementation of those "special functions" or at least be very well informed about their behavior. This raises additional questions, such as - are these special functions standardized and guaranteed to work the same way on all architectures and all compilers? Is there a list of such functions available or at least is there a convention to use code comments to signal developers that the function in question protects the code against being "optimized away"?Thurlow
@Syncopated: It's not that the compiler has to implement special treatment, it's that the compiler has to generate code that works properly with void spin_lock() { shared_data = 7; } and the only way to do that is to trash the copy held in a register and reread the value from memory. In fact, the compiler also has to spill that register back to memory first, so that void spin_lock() { if (shared_data == 42) abort(); } sees the value. So you see, treating it as a regular function call actually does exactly the right thing in the compiler.Hibbitts
@BenVoigt Suppose shared_data is a private static. The compiler now knows that spin_lock cannot touch it. So no memory barrier. And what happens if C99 restrict is added?Chilson
@Tuntable: A private static can be touched by any code, via a pointer. And its address is being taken. Perhaps the dataflow analysis is capable of proving that the pointer never escapes, but that is in general a very difficult problem, superlinear in program size. If you have a way of guaranteeing that no aliases exist, then moving the access across a spin lock should actually be ok. But if no aliases exist, volatile is pointless as well. In all cases, the "call to a function whose body cannot be seen" behavior will be correct.Hibbitts
According to this drdobbs.com/cpp/volatile-the-multithreaded-programmers-b/…, it's fine for compiler to optimize the access of any variable in critical section protected by mutex or lock, since such critical section is serialized and can be regarded as single thread context.Fructidor
@FaceBro: That article recognizes the concept that an object should not need to be accessed using a volatile qualifier at times when it's guarded by a mutex. For a freestanding implementation, the simplest kind of mutex construct is a token-passing flag that says who owns a group of objects, and each context uses the flag to pass the token any time it wants the other context to use the object. On an implementation which refrains from moving other operations past a volatile write or ahead of a volatile read, a single volatile flag can take care of that.Klink
@Featherstitch "Basically, the functions need to contain some platform-specific opcode or other trigger" Which specific opcode is needed on x86 to release a spinlock?Gleam
@Chilson "The compiler now knows that spin_lock cannot touch it." What is "touching"?Gleam
@Tuntable: A compiler intended for low-level programming on a microcontroller should provide options regarding motion of accesses to static objects across volatile objects. In most situations such motion would be a useful optimization, but in some scenarios involving memory banking (e.g. given something like int old_bank = CURRENT_BANK; CURRENT_BANK = new_bank; ...access something.. CURRENT_BANK = old_bank; if there are no accesses to any static-duration objects between CURRENT_BANK = new_bank and CURRENT_BANK = old_bank; it may be imperative that...Klink
...the compiler not reorder any accesses to static-duration objects between the bank-setting operations. Perhaps a more general way to express that would be to say that there should be a mode where an object that isn't accessed between two particular volatile operations within logical execution order, won't be accessed between the volatile operations in machine-code order.Klink
E
15

I don't think you're wrong -- volatile is necessary to guarantee that thread A will see the value change, if the value is changed by something other than thread A. As I understand it, volatile is basically a way to tell the compiler "don't cache this variable in a register, instead be sure to always read/write it from RAM memory on every access".

The confusion is because volatile isn't sufficient for implementing a number of things. In particular, modern systems use multiple levels of caching, modern multi-core CPUs do some fancy optimizations at run-time, and modern compilers do some fancy optimizations at compile time, and these all can result in various side effects showing up in a different order from the order you would expect if you just looked at the source code.

So volatile is fine, as long as you keep in mind that the 'observed' changes in the volatile variable may not occur at the exact time you think they will. Specifically, don't try to use volatile variables as a way to synchronize or order operations across threads, because it won't work reliably.

Personally, my main (only?) use for the volatile flag is as a "pleaseGoAwayNow" boolean. If I have a worker thread that loops continuously, I'll have it check the volatile boolean on each iteration of the loop, and exit if the boolean is ever true. The main thread can then safely clean up the worker thread by setting the boolean to true, and then calling pthread_join() to wait until the worker thread is gone.

Eradis answered 20/3, 2010 at 22:19 Comment(11)
Your Boolean flag is probably unsafe. How do you guarantee that the worker completes its task, and that the flag remains in scope until it is read (if it is read)? That is a job for signals. Volatile is good for implementing simple spinlocks if no mutex is involved, since alias safety means the compiler assumes mutex_lock (and every other library function) may alter the state of the flag variable.Maquette
Obviously it only works if the nature of the worker thread's routine is such that it is guaranteed to check the boolean periodically. The volatile-bool-flag is guaranteed to remain in scope because the thread-shutdown sequence always occurs before the object that holds the volatile-boolean is destroyed, and the thread-shutdown sequence calls pthread_join() after setting the bool. pthread_join() will block until the worker thread has gone away. Signals have their own problems, particularly when used in conjunction with multithreading.Eradis
Sorry, I mean pthread signals (condition variables), not POSIX signals. You still didn't explain how the worker is guaranteed to complete its work before the Boolean is true. Presumably it should receive work units in a critical section, else work could be requested after the flag is set. You've described a half-spinlock, which might work for you, but I wouldn't call it a design pattern and it probably has no advantages over a safer, more conventional, mechanism.Maquette
The worker thread isn't guaranteed to complete its work before the boolean is true -- in fact, it almost certainly will be in the middle of a work unit when the bool is set to true. But it doesn't matter when the worker thread completes its work unit, because the main thread is not going to be doing anything except blocking inside pthread_join() until the worker thread exits, in any case. So the shutdown sequence is well-ordered -- the volatile bool (and any other shared data) won't be freed until after pthread_join() returns, and pthread_join() won't return until the worker thread is gone.Eradis
@Jeremy, you are correct in practice but theoretically it could still break. On a two core system one core is constantly executing your worker thread. The other core sets the bool to true. However there is not guarantee the the worker thread's core will ever see that change, ie it may never stop even though it repeated checks the bool. This behavior is allowed by the c++0x, java, and c# memory models. In practice this would never occur as the busy thread most likely insert a memory barrier somewhere, after which it will see the change to the bool.Midsummer
@Caspin thanks for that info, that is good to know. I had thought that the volatile keyword would deal with multicore-memory-caching issues.Eradis
@Midsummer "On a two core system one core is constantly executing your worker thread." And the OS never does anything?Gleam
@curiousguy: You cannot make any assumptions on OS doing something. You might have a thread running under realtime sceduling policy and high priority and hence very seldom context switched from one of your cores (think of a 16 core system which is relatively speaking not vert busy and your worker thread is doing calculations and never/seldom doing system calls). OS will sometimes do something kind of thinking gets you thinking in very complicated ways to make sure you are reasonably correct in your particular case.Raine
@Raine You mean that a computing thread with 0 syscalls might get 100 % of CPU time, with no interrupts? Possible in theory, but not very plausible. If this is the case, all you need is to send an async signal to the thread.Gleam
Take a POSIX system, use real time scheduling policy SCHED_FIFO, higher static priority than other processes/threads in the system, enough cores, should be perfectly possible. In Linux you can specify that real-time process can use 100% of the CPU time. They will never context switch if there is no higher priority thread/process and never block by I/O. But the point is that C/C++ volatile is not meant for enforcing proper data sharing/synchronization semantics. I find searching for special cases to prove that incorrect code maybe sometimes could work is useless exercise.Raine
@Deft_code: why exactly do you think the worker thread might never see the change? Is it because the worker thread doesn't read the variable or because the supervisor thread does not write the variable? Which thread needs a memory barrier?Juliannejuliano
M
9

volatile is useful (albeit insufficient) for implementing the basic construct of a spinlock mutex, but once you have that (or something superior), you don't need another volatile.

The typical way of multithreaded programming is not to protect every shared variable at the machine level, but rather to introduce guard variables which guide program flow. Instead of volatile bool my_shared_flag; you should have

pthread_mutex_t flag_guard_mutex; // contains something volatile
bool my_shared_flag;

Not only does this encapsulate the "hard part," it's fundamentally necessary: C does not include atomic operations necessary to implement a mutex; it only has volatile to make extra guarantees about ordinary operations.

Now you have something like this:

pthread_mutex_lock( &flag_guard_mutex );
my_local_state = my_shared_flag; // critical section
pthread_mutex_unlock( &flag_guard_mutex );

pthread_mutex_lock( &flag_guard_mutex ); // may alter my_shared_flag
my_shared_flag = ! my_shared_flag; // critical section
pthread_mutex_unlock( &flag_guard_mutex );

my_shared_flag does not need to be volatile, despite being uncacheable, because

  1. Another thread has access to it.
  2. Meaning a reference to it must have been taken sometime (with the & operator).
    • (Or a reference was taken to a containing structure)
  3. pthread_mutex_lock is a library function.
  4. Meaning the compiler can't tell if pthread_mutex_lock somehow acquires that reference.
  5. Meaning the compiler must assume that pthread_mutex_lock modifes the shared flag!
  6. So the variable must be reloaded from memory. volatile, while meaningful in this context, is extraneous.
Maquette answered 20/3, 2010 at 23:18 Comment(0)
C
7

Your understanding really is wrong.

The property, that the volatile variables have, is "reads from and writes to this variable are part of perceivable behaviour of the program". That means this program works (given appropriate hardware):

int volatile* reg=IO_MAPPED_REGISTER_ADDRESS;
*reg=1; // turn the fuel on
*reg=2; // ignition
*reg=3; // release
int x=*reg; // fire missiles

The problem is, this is not the property we want from thread-safe anything.

For example, a thread-safe counter would be just (linux-kernel-like code, don't know the c++0x equivalent):

atomic_t counter;

...
atomic_inc(&counter);

This is atomic, without a memory barrier. You should add them if necessary. Adding volatile would probably not help, because it wouldn't relate the access to the nearby code (eg. to appending of an element to the list the counter is counting). Certainly, you don't need to see the counter incremented outside your program, and optimisations are still desirable, eg.

atomic_inc(&counter);
atomic_inc(&counter);

can still be optimised to

atomically {
  counter+=2;
}

if the optimizer is smart enough (it doesn't change the semantics of the code).

Cassity answered 20/3, 2010 at 22:43 Comment(0)
E
6

For your data to be consistent in a concurrent environment you need two conditions to apply:

1) Atomicity i.e if I read or write some data to memory then that data gets read/written in one pass and cannot be interrupted or contended due to e.g a context switch

2) Consistency i.e the order of read/write ops must be seen to be the same between multiple concurrent environments - be that threads, machines etc

volatile fits neither of the above - or more particularly, the c or c++ standard as to how volatile should behave includes neither of the above.

It's even worse in practice as some compilers ( such as the intel Itanium compiler ) do attempt to implement some element of concurrent access safe behaviour ( i.e by ensuring memory fences ) however there is no consistency across compiler implementations and moreover the standard does not require this of the implementation in the first place.

Marking a variable as volatile will just mean that you are forcing the value to be flushed to and from memory each time which in many cases just slows down your code as you've basically blown your cache performance.

c# and java AFAIK do redress this by making volatile adhere to 1) and 2) however the same cannot be said for c/c++ compilers so basically do with it as you see fit.

For some more in depth ( though not unbiased ) discussion on the subject read this

Exceeding answered 21/3, 2010 at 1:28 Comment(2)
+1 - guaranteed atomicity was another piece of what I was missing. I was assuming that loading an int is atomic, so that volatile preventing the re-ordering provided the full solution on the read side. I think it's a decent assumption on most architectures, but it is not a guarantee.Peterman
When are individual reads and writes to memory interruptible and non-atomic? Is there any benefit?Flatt
P
6

The comp.programming.threads FAQ has a classic explanation by Dave Butenhof:

Q56: Why don't I need to declare shared variables VOLATILE?

I'm concerned, however, about cases where both the compiler and the threads library fulfill their respective specifications. A conforming C compiler can globally allocate some shared (nonvolatile) variable to a register that gets saved and restored as the CPU gets passed from thread to thread. Each thread will have it's own private value for this shared variable, which is not what we want from a shared variable.

In some sense this is true, if the compiler knows enough about the respective scopes of the variable and the pthread_cond_wait (or pthread_mutex_lock) functions. In practice, most compilers will not try to keep register copies of global data across a call to an external function, because it's too hard to know whether the routine might somehow have access to the address of the data.

So yes, it's true that a compiler that conforms strictly (but very aggressively) to ANSI C might not work with multiple threads without volatile. But someone had better fix it. Because any SYSTEM (that is, pragmatically, a combination of kernel, libraries, and C compiler) that does not provide the POSIX memory coherency guarantees does not CONFORM to the POSIX standard. Period. The system CANNOT require you to use volatile on shared variables for correct behavior, because POSIX requires only that the POSIX synchronization functions are necessary.

So if your program breaks because you didn't use volatile, that's a BUG. It may not be a bug in C, or a bug in the threads library, or a bug in the kernel. But it's a SYSTEM bug, and one or more of those components will have to work to fix it.

You don't want to use volatile, because, on any system where it makes any difference, it will be vastly more expensive than a proper nonvolatile variable. (ANSI C requires "sequence points" for volatile variables at each expression, whereas POSIX requires them only at synchronization operations -- a compute-intensive threaded application will see substantially more memory activity using volatile, and, after all, it's the memory activity that really slows you down.)

/---[ Dave Butenhof ]-----------------------[ [email protected] ]---\
| Digital Equipment Corporation 110 Spit Brook Rd ZKO2-3/Q18 |
| 603.881.2218, FAX 603.881.0120 Nashua NH 03062-2698 |
-----------------[ Better Living Through Concurrency ]----------------/

Mr Butenhof covers much of the same ground in this usenet post:

The use of "volatile" is not sufficient to ensure proper memory visibility or synchronization between threads. The use of a mutex is sufficient, and, except by resorting to various non-portable machine code alternatives, (or more subtle implications of the POSIX memory rules that are much more difficult to apply generally, as explained in my previous post), a mutex is NECESSARY.

Therefore, as Bryan explained, the use of volatile accomplishes nothing but to prevent the compiler from making useful and desirable optimizations, providing no help whatsoever in making code "thread safe". You're welcome, of course, to declare anything you want as "volatile" -- it's a legal ANSI C storage attribute, after all. Just don't expect it to solve any thread synchronization problems for you.

All that's equally applicable to C++.

Potoroo answered 5/10, 2010 at 8:5 Comment(1)
The link is broken; it no longer seems to point to what you wanted to cite. Without the text, its kind of a meaningless answer.Rieger
C
5

This is all that "volatile" is doing: "Hey compiler, this variable could change AT ANY MOMENT (on any clock tick) even if there are NO LOCAL INSTRUCTIONS acting on it. Do NOT cache this value in a register."

That is IT. It tells the compiler that your value is, well, volatile- this value may be altered at any moment by external logic (another thread, another process, the Kernel, etc.). It exists more or less solely to suppress compiler optimizations that will silently cache a value in a register that it is inherently unsafe to EVER cache.

You may encounter articles like "Dr. Dobbs" that pitch volatile as some panacea for multi-threaded programming. His approach isn't totally devoid of merit, but it has the fundamental flaw of making an object's users responsible for its thread-safety, which tends to have the same issues as other violations of encapsulation.

Chitin answered 2/8, 2014 at 1:54 Comment(0)
S
3

According to my old C standard, “What constitutes an access to an object that has volatile- qualified type is implementation-defined”. So C compiler writers could have choosen to have "volatile" mean "thread safe access in a multi-process environment". But they didn't.

Instead, the operations required to make a critical section thread safe in a multi-core multi-process shared memory environment were added as new implementation-defined features. And, freed from the requirement that "volatile" would provide atomic access and access ordering in a multi-process environment, the compiler writers prioritised code-reduction over historical implemention-dependant "volatile" semantics.

This means that things like "volatile" semaphores around critical code sections, which do not work on new hardware with new compilers, might once have worked with old compilers on old hardware, and old examples are sometimes not wrong, just old.

Schonfeld answered 14/11, 2014 at 11:34 Comment(2)
The old examples required that program be processed by quality compilers that are suitable for low-level programming. Unfortunately, "modern" compilers have taken the fact that the Standard doesn't require them to process "volatile" in a useful fashion as an indication that code which would require them to do so is broken, rather than recognizing that the Standard makes no effort to forbid implementations that are conforming but of such low quality as to be useless, but does not in any way condone low-quality-but-conforming compilers that have become popularKlink
On most platforms, it would be fairly easy to recognize what volatile would need to do to allow one to write an OS in a manner which is hardware-dependent but compiler-independent. Requiring that programmers use implementation-dependent features rather than making volatile work as required undermines the purpose of having a standard.Klink

© 2022 - 2024 — McMap. All rights reserved.