A complete understanding of this requires a bit of a history lesson. (And who doesn't like history?…says the guy who majored in history.) The /volatile:ms
semantics were first added to the compiler with Visual Studio 2005. Starting with that version, variables marked volatile
automatically imposed acquire semantics on reads, and release semantics on writes, through that variable.
What does this mean? It has to do with the memory model, and specifically, how aggressively the compiler is permitted to reorder memory-access operations. An operation that has acquire semantics prevents subsequent memory operations from being hoisted above it; an operation that has release semantics prevents preceding memory operations from being delayed until after it. As the names suggest, acquire semantics are typically used when you are acquiring a resource, whereas release semantics are typically used when you are releasing a resource. MSDN has a more complete description of acquire and release semantics; it says:
An operation has acquire semantics if other processors will always see
its effect before any subsequent operation's effect. An operation has
release semantics if other processors will see every preceding
operation's effect before the effect of the operation itself. Consider
the following code example:
a++;
b++;
c++;
From another processor's point of view, the preceding operations can
appear to occur in any order. For example, the other processor might
see the increment of b
before the increment of a
.
For example, the InterlockedIncrementAcquire
routine uses acquire semantics to increment a variable. If you rewrote the preceding code example as follows:
InterlockedIncrementAcquire(&a);
b++;
c++;
other processors would always see the increment of a
before the increments of b
and c
.
Likewise, the InterlockedIncrementRelease
routine uses release semantics to increment a variable. If you rewrote the code example once again, as follows:
a++;
b++;
InterlockedIncrementRelease(&c);
other processors would always see the increments of a
and b
before the increment of c
.
Now, like MSDN says, atomic operations have both acquire and release semantics. And, in fact, on x86, there is no way to give an instruction only acquire or release semantics, so achieving even one of these requires that the instruction is made atomic (which the compiler will generally do by emitting a LOCK CMPXCHG
instruction).
Before Visual Studio 2005's enhancement of the volatile
semantics, developers who wanted to write correct code needed to use the Interlocked*
family of functions, as described in the MSDN article. Unfortunately, many developers failed to do this and got code that worked mostly by accident (or didn't work at all). But there was a good chance that it did work by accident, given the x86's relatively strict memory model. You often get the semantics you want for free since, on x86, most loads and stores already have acquire/release semantics, so you don't even need to make anything atomic. (Non-temporal stores are the obvious exception, but in this case, those wouldn't matter anyway.) I suspect this ease of implementation on x86 is what, combined with the realization that programmers generally failed to understand and do the right thing, persuaded Microsoft to strengthen the semantics of volatile
in VS 2005.
Another potential reason for the change was the growing importance of multi-threaded code. 2005 was around the time that Pentium 4 chips with HyperThreading were beginning to become popular, effectively bringing simultaneous multi-threading to every users' desktop. Probably not coincidentally, VS 2005 also removed the option to link to single-threaded version of the C run-time libraries. It is when you have multi-threaded code—with the possibility of executing on multiple processors—that you really have to start worrying about getting the memory-access semantics correct.
With VS 2005 and later, you could just mark a pointer parameter as volatile
and get the desired acquire semantics. The volatility implied/imposed the acquire semantics, which made multi-threaded code running in multi-processing environments safe. Prior to 2011, this was extremely important, since the C and C++ language standards had absolutely nothing to say about threading and gave you no portable way of writing correct code.
And this brings us right to the answer to your question. If your code assumes these extended semantics for volatile
, then you need to pass the /volatile:ms
switch to ensure that the compiler continues to apply them. If you have written C++11-style code that uses modern primitives for atomic, thread-safe operations, then don't need volatile
to have these extended semantics and are safe passing /volatile:iso
. In other words, as manni66 quipped, if your code "misuses volatile
as std::atomic
", then you will see a difference in behavior and need /volatile:ms
to guarantee that volatile
does have the same effect as std::atomic
.
As it turns out, it has proven very difficult for me to find an example of a case where /volatile:iso
actually changes the generated code, as compared to /volatile:ms
. Microsoft's optimizer is actually very conservative with respect to reordering instructions, which is the type of thing that the acquire/release semantics are supposed to protect against.
Here's a simple example (where we're using a volatile
global variable to guard a critical section, as you might find in a simplistic "lock-free" implementation) that should demonstrate the difference:
volatile bool CriticalSection;
int Data[100];
void FillData(int i)
{
Data[i] = 42; // fill data item at index 'i'
CriticalSection = false; // release critical section
}
If you compile this with GCC at -O2
, it will generate the following machine code:
FillData(int):
mov eax, DWORD PTR [esp+4] // retrieve parameter 'i' from stack
mov BYTE PTR [CriticalSection], 0 // store '0' in 'CriticalSection'
mov DWORD PTR [Data+eax*4], 42 // store '42' at index 'i' in 'Data'
ret
Even if you aren't fluent in assembly language, you should be able to see that the optimizer has re-ordered the stores, such that the critical section is released (CriticalSection = false
) before the data is filled in (Data[i] = 42
)—precisely the opposite of the order in which the statements appeared in the original C code. The volatile
is having no effect on this re-ordering, because GCC follows the ISO semantics, just like /volatile:iso
will (in theory).
By the way, notice how…um…volatile :-) this ordering is. If we compile at -O1
in GCC, we get instructions that do everything in the same order as our original C code:
FillData(int):
mov eax, DWORD PTR [esp+4] // retrieve parameter 'i' from stack
mov DWORD PTR [Data+eax*4], 42 // store '42' at index 'i' in 'Data'
mov BYTE PTR [CriticalSection], 0 // store '0' in 'CriticalSection'
ret
When you start throwing more instructions in there for the compiler to rearrange, and especially if this code were to get inlined, you can imagine how unlikely it is that the original order is preserved.
But, like I said, MSVC is actually very conservative with regard to re-ordering instructions. Regardless of whether I specify /volatile:ms
or /volatile:iso
, I get the exactly same machine code:
FillData, COMDAT PROC
mov eax, DWORD PTR [esp+4]
mov DWORD PTR [Data+eax*4], 42
mov BYTE PTR [CriticalSection], 0
ret
FillData ENDP
where the stores are done in the original order. I've played with all sorts of different permutations, introducing additional variables and operations, all without being able to find the magic sequence that causes MSVC to re-order the stores. So, it is very likely that, currently, in practice, you won't see a very big difference with the /volatile:iso
switch set when targeting x86 architectures. But that's a very loose guarantee, to say the least.
Note that this empirical observation is consistent with Alexander Gutenev's speculation that a difference in semantics is observed only on ARM, and that the whole reason these switches were introduced was to avoid paying a performance penalty on this newly-supported platform. Meanwhile, on the x86 side, there have been no actual changes to the semantics in generated code, since there is essentially no cost. (Save for some extremely trivial optimization possibilities, but that would require that their optimizer have two completely separate schedulers, which probably isn't a good use of developer time.)
The point is that, with /volatile:iso
, MSVC is permitted to act like GCC and re-order the stores. With /volatile:ms
, you are guaranteed that it won't because volatile
implies acquire/release semantics for that variable.
Bonus Reading: So, what is volatile
supposed to be used for, in strictly ISO-compliant code (i.e., when the /volatile:iso
switch is used)? Well, volatile
is basically meant for memory-mapped I/O. That's what it was originally meant for when it was first introduced, and it remains its principal purpose. I've heard it jokingly said that volatile
is for reading/writing a tape drive. Basically, you mark the pointer volatile
in order to prevent the compiler from optimizing reads and writes away. For example:
volatile char* pDeviceIOAddr = ...;
void Wait()
{
while (*pDeviceIOAddr)
{ }
}
Qualifying the parameter's type with volatile
prevents the compiler from assuming that subsequent reads return the same value, forcing it to do a new read each time through the loop. In other words:
mov eax, DWORD PTR [pDeviceIoAddr] // get pointer
Wait:
cmp BYTE PTR [eax], 0 // dereference pointer, read 1 byte,
jnz Wait // and compare to 0
If pDeviceIoAddr
wasn't volatile
, the entire loop could have been elided. Optimizers definitely do this in practice, including MSVC. Or, you could get the following pathological code:
mov eax, DWORD PTR [pDeviceIoAddr] // get pointer
mov al, BYTE PTR [eax] // dereference pointer, read 1 byte
Wait:
cmp al, 0 // compare it to 0
jnz Wait
where the pointer is dereferenced once, outside of the loop, caching the byte in a register. The instruction at the top of the loop just tests that enregistered value, creating either no loop or an infinite loop. Oops.
Notice, however, that this use of volatile
in ISO-standard C++ does not obviate the need for critical sections, mutexes, or other types of locks. Even the correct version of the above code wouldn't work correctly if another thread could potentially modify pDeviceIOAddr
, since the read of that address/pointer does not have acquire semantics. Acquire semantics would look like this:
Wait:
mov eax, DWORD PTR [pDeviceIoAddr] // get pointer (acquire semantics)
cmp BYTE PTR [eax], 0 // dereference pointer, read 1 byte,
jnz Wait // and compare to 0
and to get that, you would need C++11's std::atomic
.