What Rules does compiler have to follow when dealing with volatile memory locations?
Asked Answered
M

5

12

I know when reading from a location of memory which is written to by several threads or processes the volatile keyword should be used for that location like some cases below but I want to know more about what restrictions does it really make for compiler and basically what rules does compiler have to follow when dealing with such case and is there any exceptional case where despite simultaneous access to a memory location the volatile keyword can be ignored by programmer.

volatile SomeType * ptr = someAddress;
void someFunc(volatile const SomeType & input){
 //function body
}
Massie answered 9/11, 2010 at 18:1 Comment(8)
Note that in portable C++ volatile cannot be used as a poor man's thread synchronization (although compilers might extend its meaning thus). Writing to a volatile object in one thread does not necessarily mean another thread will see the updated value. (It might only have been written to one CPU's cache, but not through the cache into whatever memory the CPUs share.) For that you need memory barriers.Sandberg
@sbi: While your comment is true in the bold face, I don't think that there is any way for a conforming compiler to leave the value in the CPU cache and not flush it to memory. After all that is the actual meaning of volatile: writes need to make it to main memory. The reason that it cannot be used for synchronization is that the guarantees don't ensure atomicity or reorders with non-volatile variables.Esau
@dribeas: The C++98 "abstract machine" has no concept of CPU cache (or registers, for that matter) so no, there is no requirement to flush volatile writes to main memory.Cleavage
@David: "writes need to make it to main memory" I'm way out of my depth here, but from what I know volatile is often used for addresses that do not even correspond to memory, so I think this must be wrong. But, yes, atomicity and write ordering are problems I forgot about.Sandberg
@sbi, @Zack: right, I typed faster than I could think: the abstract machine does not have the concept of cpu cache. I mixed two concepts, the c++ memory model determines that it has to be written out to memory. The hardware architectures are what ensures that the view of the memory is consistent across processors --even if that imposes some burden in performance. The language determines that it is "written to memory". The hardware architecture ensures that the memory as seen by the different processors is consistent.Esau
@dribeas: actually, more than a few architectures have made little or no guarantee of cross-processor memory consistency. See kernel.org/doc/Documentation/memory-barriers.txt for a comprehensive overview of what system programmers may have to cope with.Cleavage
I attempted a canonical answer on a near duplicate: When to use volatile with multi threading? (never, but it does work in practice at least on some compilers as somewhat like memory_order_relaxed. GCC/clang at least also make an effort to do the load or store as a single access that will be atomic if the HW guarantees it.)Biotechnology
It mostly just tells the compiler that even though it can't see any instructions that change the value, it cannot assume the value has not changed. For example, something that may look like an infinite loop (or impossible to execute instruction) may not be as such.Nonrecognition
M
9

A particular and very common optimization that is ruled out by volatile is to cache a value from memory into a register, and use the register for repeated access (because this is much faster than going back to memory every time).

Instead the compiler must fetch the value from memory every time (taking a hint from Zach, I should say that "every time" is bounded by sequence points).

Nor can a sequence of writes make use of a register and only write the final value back later on: every write must be pushed out to memory.

Why is this useful? On some architectures certain IO devices map their inputs or outputs to a memory location (i.e. a byte written to that location actually goes out on the serial line). If the compiler redirects some of those writes to a register that is only flushed occasionally then most of the bytes won't go onto the serial line. Not good. Using volatile prevents this situation.

Marchall answered 9/11, 2010 at 18:31 Comment(0)
Z
21

What you know is false. Volatile is not used to synchronize memory access between threads, apply any kind of memory fences, or anything of the sort. Operations on volatile memory are not atomic, and they are not guaranteed to be in any particular order. volatile is one of the most misunderstood facilities in the entire language. "Volatile is almost useless for multi-threadded programming."

What volatile is used for is interfacing with memory-mapped hardware, signal handlers and the setjmp machine code instruction.

It can also be used in a similar way that const is used, and this is how Alexandrescu uses it in this article. But make no mistake. volatile doesn't make your code magically thread safe. Used in this specific way, it is simply a tool that can help the compiler tell you where you might have messed up. It is still up to you to fix your mistakes, and volatile plays no role in fixing those mistakes.

EDIT: I'll try to elaborate a little bit on what I just said.

Suppose you have a class that has a pointer to something that cannot change. You might naturally make the pointer const:

class MyGizmo
{ 
public:
  const Foo* foo_;
};

What does const really do for you here? It doesn't do anything to the memory. It's not like the write-protect tab on an old floppy disc. The memory itself it still writable. You just can't write to it through the foo_ pointer. So const is really just a way to give the compiler another way to let you know when you might be messing up. If you were to write this code:

gizmo.foo_->bar_ = 42;

...the compiler won't allow it, because it's marked const. Obviously you can get around this by using const_cast to cast away the const-ness, but if you need to be convinced this is a bad idea then there is no help for you. :)

Alexandrescu's use of volatile is exactly the same. It doesn't do anything to make the memory somehow "thread safe" in any way whatsoever. What it does is it gives the compiler another way to let you know when you may have screwed up. You mark things that you have made truly "thread safe" (through the use of actual synchronization objects, like Mutexes or Semaphores) as being volatile. Then the compiler won't let you use them in a non-volatile context. It throws a compiler error you then have to think about and fix. You could again get around it by casting away the volatile-ness using const_cast, but this is just as Evil as casting away const-ness.

My advice to you is to completely abandon volatile as a tool in writing multithreadded applications (edit:) until you really know what you're doing and why. It has some benefit but not in the way that most people think, and if you use it incorrectly, you could write dangerously unsafe applications.

Zarger answered 9/11, 2010 at 18:21 Comment(15)
@John Dibling_Where did I tell I think volatile is used to synchronize memory access between threads?Massie
@Pooria: In your very first sentence.Sandberg
@Pooria: Here: "when reading from a location of memory which is written to by several threads or processes the volatile keyword should be used for that location" This assertion is absolutely false. volatile does nothing for you here other than give you a false sense of security.Zarger
I would add that some compilers do implement acquire and release semantics for volatile, but it's non-standard and implementation-specific.Franz
@John Dibling_I don't get you, what I use and I think everyone else would use for synchronization of memory accesses by threads is a kind of synchronization object, maybe you have a misconception of my statements in question.Massie
@Pooria: Then why do you think you need to use volatile here?Zarger
@John Dibling_I believe the volatile keyword has its own use apart from synchronization objects as if you take a look at some stuff other people said you'll know what I mean like "ruling out caching a value from memory into a register, and using the register for repeated access" as dmckee has mentioned.Massie
@Pooria: It does, and I have enumerated its uses. But none of those uses apply to the context you've given us. Your first sentence gives us our context, and it can be paraphrased as "When doing multithreaded programming, variables should be marked volatile" I'm asking you why you think you need volatile in your specific case.Zarger
@John Dibling_Just read what others have got to say you'll find out.Massie
@Pooria: I can only hope that your defensive attitude and refusal to answer my questions is an indication that you are offended because you've learned that you were wrong. For what it's worth, I always read everything that everyone has to say. That's how I learn.Zarger
@John Dibling_It's totally confusing me why you insist on such statement of completely throwing volatile keyword in trash can in multithreaded programming, that's opposing to what some others said and also look at some comments on this page of which you've put the link(software.intel.com/en-us/blogs/2007/11/30/…) telling some things are not right in that article.Massie
@John Dibling_check it out(stackoverflow.com/questions/72552/…).Massie
@Pooria: In your linked question, the accepted answer deals specifically with memory-mapped hardware. If you had read my post, you will have noted that a) I specifcally say that's what volatile is for, and b) even that has nothing to do with multithreadded programming. It has to do with hardware access.Zarger
Hi John, thanks for the comprehensible answer. You mentioned, "Then the compiler won't let you use them in a non-volatile context. It throws a compiler error you then have to think about and fix.", what is a good example of this?Benignant
This was so long ago, I no longer have the example at hand. But there were great examples in the book I mention. I recommend it.Zarger
C
11

In C++11 and later, there's no reason to use volatile as a poor-man's std::atomic with std::memory_order_relaxed. Just use std::atomic with relaxed. On compilers where volatile works the way you wanted, std::atomic with relaxed will compile to about the same asm which is equally fast. See When to use volatile with multi threading? (never)

This answer is about the separate question of what the rule are exactly for volatile.


It's not as well defined as you probably want it to be. Most of the relevant standardese from C++98 is in section 1.9, "Program Execution":

The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions.

Accessing an object designated by a volatile lvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression might produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.

Once the execution of a function begins, no expressions from the calling function are evaluated until execution of the called function has completed.

When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects with type other than volatile sig_atomic_t are unspecified, and the value of any object not of volatile sig_atomic_t that is modified by the handler becomes undefined.

An instance of each object with automatic storage duration (3.7.2) is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended (by a call of a function or receipt of a signal).

The least requirements on a conforming implementation are:

  • At sequence points, volatile objects are stable in the sense that previous evaluations are complete and subsequent evaluations have not yet occurred.

  • At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.

  • The input and output dynamics of interactive devices shall take place in such a fashion that prompting messages actually appear prior to a program waiting for input. What constitutes an interactive device is implementation-defined.

So what that boils down to is:

  • The compiler cannot optimize away reads or writes to volatile objects. For simple cases like the one casablanca mentioned, that works the way you might think. However, in cases like

      volatile int a;
      int b;
      b = a = 42;
    

    people can and do argue about whether the compiler has to generate code as if the last line had read

      a = 42; b = a;
    

    or if it can, as it normally would (in the absence of volatile), generate

      a = 42; b = 42;
    

    (C++0x may have addressed this point, I haven't read the whole thing.)

  • The compiler may not reorder operations on two different volatile objects that occur in separate statements (every semicolon is a sequence point) but it is totally allowed to rearrange accesses to non-volatile objects relative to volatile ones. This is one of the many reasons why you should not try to write your own spinlocks, and is the primary reason why John Dibling is warning you not to treat volatile as a panacea for multithreaded programming.

  • Speaking of threads, you will have noticed the complete absence of any mention of threads in the standards text. That is because C++98 has no concept of threads. (C++0x does, and may well specify their interaction with volatile, but I wouldn't be assuming anyone implements those rules yet if I were you.) Therefore, there is no guarantee that accesses to volatile objects from one thread are visible to another thread. This is the other major reason volatile is not especially useful for multithreaded programming.

  • There is no guarantee that volatile objects are accessed in one piece, or that modifications to volatile objects avoid touching other things right next to them in memory. This is not explicit in what I quoted but is implied by the stuff about volatile sig_atomic_t -- the sig_atomic_t part would be unnecessary otherwise. This makes volatile substantially less useful for access to I/O devices than it was probably intended to be, and compilers marketed for embedded programming often offer stronger guarantees, but it's not something you can count on.

  • Lots of people try to make specific accesses to objects have volatile semantics, e.g. doing

      T x;
      *(volatile T *)&x = foo();
    

    This is legit (because it says "object designated by a volatile lvalue" and not "object with a volatile type") but has to be done with great care, because remember what I said about the compiler being totally allowed to reorder non-volatile accesses relative to volatile ones? That goes even if it's the same object (as far as I know anyway).

  • If you are worried about compile-time reordering of accesses to more than one volatile value, you need to understand the sequence point rules, which are long and complicated and I'm not going to quote them here because this answer is already too long, but here's a good explanation which is only a little simplified. If you find yourself needing to worry about the differences in the sequence point rules between C and C++ you have already screwed up somewhere (for instance, as a rule of thumb, never overload &&).

    If you also need run-time ordering of the visibility of volatile stores as seen by volatile loads in other threads on ISAs other than x86, you'd need inline asm or intrinsics for barrier instructions... Or better, use std::atomic with a memory order other than relaxed, e.g. std::memory_order_acquire and std::memory_order_release. (Those orderings are still "free" on x86, but will use special load/store instructions or barriers on non-x86 with weakly ordered memory models.)

    std::atomic also has the huge advantage of being able to establish happens-before synchronization between threads, e.g. making it possible to release-store a data_ready flag so readers can acquire-load and then (if the flag is true) access a plain array. (MSVC historically gave volatile acquire and release semantics so it could do this. /volatile:ms enables this behaviour, /volatile:iso disables that extra ordering.)

Cleavage answered 9/11, 2010 at 18:25 Comment(4)
Note that some known experts (Herb Sutter in particular) mention that a smart compiler can even treat a volatile variable as non volatile if it can demonstrate that it cannot be read externally: think on a variable declared volatile but not bound to a particular address, for which no pointers/references are passed to other code --example: an auto volatile variable, a conforming compiler can treat it as non-volatile.Esau
I can see the logic, but that's leaning a bit too hard on the as-if rule if you ask me.Cleavage
@DavidRodríguez-dribeas: Interesting notion, though the only case where I can imagine it as being relevant would be if a loop read or wrote a volatile variable some number of times; if such reads or writes have to be performed in sequence, no code after the loop could have any observable side-effects until the processor had performed the appropriate number of reads or writes. Such a thing might be better accomplished by having the standard specify a __SIDE_EFFECT() macro which would compel the compiler to do whatever was necessary to ensure that the code around it was executed in sequence.Allmon
@zwol: I made some edits to make this a good 2023 answer which addresses the part of question asking about lock-free multithreading. When to use volatile with multi threading? answers that with a resounding "never", but question of what the exact rules are on paper vs. in some real-world compilers is separate and I think this answer is useful for that.Biotechnology
M
9

A particular and very common optimization that is ruled out by volatile is to cache a value from memory into a register, and use the register for repeated access (because this is much faster than going back to memory every time).

Instead the compiler must fetch the value from memory every time (taking a hint from Zach, I should say that "every time" is bounded by sequence points).

Nor can a sequence of writes make use of a register and only write the final value back later on: every write must be pushed out to memory.

Why is this useful? On some architectures certain IO devices map their inputs or outputs to a memory location (i.e. a byte written to that location actually goes out on the serial line). If the compiler redirects some of those writes to a register that is only flushed occasionally then most of the bytes won't go onto the serial line. Not good. Using volatile prevents this situation.

Marchall answered 9/11, 2010 at 18:31 Comment(0)
V
7

Declaring a variable as volatile means the compiler can't make any assumptions about the value that it could have done otherwise, and hence prevents the compiler from applying various optimizations. Essentially it forces the compiler to re-read the value from memory on each access, even if the normal flow of code doesn't change the value. For example:

int *i = ...;
cout << *i; // line A
// ... (some code that doesn't use i)
cout << *i; // line B

In this case, the compiler would normally assume that since the value at i wasn't modified in between, it's okay to retain the value from line A (say in a register) and print the same value in B. However, if you mark i as volatile, you're telling the compiler that some external source could have possibly modified the value at i between line A and B, so the compiler must re-fetch the current value from memory.

Vernice answered 9/11, 2010 at 18:13 Comment(0)
O
1

The compiler is not allowed to optimize away reads of a volatile object in a loop, which otherwise it'd normally do (i.e. strlen()).

It's commonly used in embedded programming when reading a hardware registry at a fixed address, and that value may change unexpectedly. (In contrast with "normal" memory, that doesn't change unless written to by the program itself...)

That is it's main purpose.

It could also be used to make sure one thread see the change in a value written by another, but it in no way guarantees atomicity when reading/writing to said object.

Orsino answered 9/11, 2010 at 18:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.