VC++, /volatile:ms on x86

I

2

5

The documentation on volatile says:

When the /volatile:ms compiler option is used—by default when architectures other than ARM are targeted—the compiler generates extra code to maintain ordering among references to volatile objects in addition to maintaining ordering to references to other global objects.

What exact code could compile differently with /volatile:ms and /volatile:iso?

Iminourea answered 5/6, 2017 at 17:40 Comment(2)

Code that misuses volatile as std::atomic. – Perfoliate 5/6, 2017 at 17:42

It is possible that currently nothing, so it's only affecting ARM as of now. So, they had to introduce this option to fix ARM not to generate extra instructions, but still they retain guaranties on x86-64, since on this architecture it costs nothing. – Schweinfurt 7/6, 2017 at 6:33

T

8

A complete understanding of this requires a bit of a history lesson. _{^{(And who doesn't like history?…says the guy who majored in history.)}} The /volatile:ms semantics were first added to the compiler with Visual Studio 2005. Starting with that version, variables marked volatile automatically imposed acquire semantics on reads, and release semantics on writes, through that variable.

What does this mean? It has to do with the memory model, and specifically, how aggressively the compiler is permitted to reorder memory-access operations. An operation that has acquire semantics prevents subsequent memory operations from being hoisted above it; an operation that has release semantics prevents preceding memory operations from being delayed until after it. As the names suggest, acquire semantics are typically used when you are acquiring a resource, whereas release semantics are typically used when you are releasing a resource. MSDN has a more complete description of acquire and release semantics; it says:

An operation has acquire semantics if other processors will always see its effect before any subsequent operation's effect. An operation has release semantics if other processors will see every preceding operation's effect before the effect of the operation itself. Consider the following code example:
a++;
b++;
c++;
From another processor's point of view, the preceding operations can appear to occur in any order. For example, the other processor might see the increment of b before the increment of a.

For example, the InterlockedIncrementAcquire routine uses acquire semantics to increment a variable. If you rewrote the preceding code example as follows:
InterlockedIncrementAcquire(&a);
b++;
c++;
other processors would always see the increment of a before the increments of b and c.

Likewise, the InterlockedIncrementRelease routine uses release semantics to increment a variable. If you rewrote the code example once again, as follows:
a++;
b++;
InterlockedIncrementRelease(&c);
other processors would always see the increments of a and b before the increment of c.

Now, like MSDN says, atomic operations have both acquire and release semantics. And, in fact, on x86, there is no way to give an instruction only acquire or release semantics, so achieving even one of these requires that the instruction is made atomic (which the compiler will generally do by emitting a LOCK CMPXCHG instruction).

Before Visual Studio 2005's enhancement of the volatile semantics, developers who wanted to write correct code needed to use the Interlocked* family of functions, as described in the MSDN article. Unfortunately, many developers failed to do this and got code that worked mostly by accident (or didn't work at all). But there was a good chance that it did work by accident, given the x86's relatively strict memory model. You often get the semantics you want for free since, on x86, most loads and stores already have acquire/release semantics, so you don't even need to make anything atomic. (Non-temporal stores are the obvious exception, but in this case, those wouldn't matter anyway.) I suspect this ease of implementation on x86 is what, combined with the realization that programmers generally failed to understand and do the right thing, persuaded Microsoft to strengthen the semantics of volatile in VS 2005.

Another potential reason for the change was the growing importance of multi-threaded code. 2005 was around the time that Pentium 4 chips with HyperThreading were beginning to become popular, effectively bringing simultaneous multi-threading to every users' desktop. Probably not coincidentally, VS 2005 also removed the option to link to single-threaded version of the C run-time libraries. It is when you have multi-threaded code—with the possibility of executing on multiple processors—that you really have to start worrying about getting the memory-access semantics correct.

With VS 2005 and later, you could just mark a pointer parameter as volatile and get the desired acquire semantics. The volatility implied/imposed the acquire semantics, which made multi-threaded code running in multi-processing environments safe. Prior to 2011, this was extremely important, since the C and C++ language standards had absolutely nothing to say about threading and gave you no portable way of writing correct code.

And this brings us right to the answer to your question. If your code assumes these extended semantics for volatile, then you need to pass the /volatile:ms switch to ensure that the compiler continues to apply them. If you have written C++11-style code that uses modern primitives for atomic, thread-safe operations, then don't need volatile to have these extended semantics and are safe passing /volatile:iso. In other words, as manni66 quipped, if your code "misuses volatile as std::atomic", then you will see a difference in behavior and need /volatile:ms to guarantee that volatile does have the same effect as std::atomic.

As it turns out, it has proven very difficult for me to find an example of a case where /volatile:iso actually changes the generated code, as compared to /volatile:ms. Microsoft's optimizer is actually very conservative with respect to reordering instructions, which is the type of thing that the acquire/release semantics are supposed to protect against.

Here's a simple example (where we're using a volatile global variable to guard a critical section, as you might find in a simplistic "lock-free" implementation) that should demonstrate the difference:

volatile bool CriticalSection;
int           Data[100];

void FillData(int i)
{
   Data[i] = 42;              // fill data item at index 'i'
   CriticalSection = false;   // release critical section
}

If you compile this with GCC at -O2, it will generate the following machine code:

FillData(int):
    mov     eax, DWORD PTR [esp+4]             // retrieve parameter 'i' from stack
    mov     BYTE PTR [CriticalSection], 0      // store '0' in 'CriticalSection'
    mov     DWORD PTR [Data+eax*4], 42         // store '42' at index 'i' in 'Data'
    ret

Even if you aren't fluent in assembly language, you should be able to see that the optimizer has re-ordered the stores, such that the critical section is released (CriticalSection = false) before the data is filled in (Data[i] = 42)—precisely the opposite of the order in which the statements appeared in the original C code. The volatile is having no effect on this re-ordering, because GCC follows the ISO semantics, just like /volatile:iso will (in theory).

By the way, notice how…um…volatile :-) this ordering is. If we compile at -O1 in GCC, we get instructions that do everything in the same order as our original C code:

FillData(int):
    mov     eax, DWORD PTR [esp+4]             // retrieve parameter 'i' from stack
    mov     DWORD PTR [Data+eax*4], 42         // store '42' at index 'i' in 'Data'
    mov     BYTE PTR [CriticalSection], 0      // store '0' in 'CriticalSection'
    ret

When you start throwing more instructions in there for the compiler to rearrange, and especially if this code were to get inlined, you can imagine how unlikely it is that the original order is preserved.

But, like I said, MSVC is actually very conservative with regard to re-ordering instructions. Regardless of whether I specify /volatile:ms or /volatile:iso, I get the exactly same machine code:

FillData, COMDAT PROC
    mov      eax, DWORD PTR [esp+4]
    mov      DWORD PTR [Data+eax*4], 42
    mov      BYTE PTR [CriticalSection], 0
    ret
FillData ENDP

where the stores are done in the original order. I've played with all sorts of different permutations, introducing additional variables and operations, all without being able to find the magic sequence that causes MSVC to re-order the stores. So, it is very likely that, currently, in practice, you won't see a very big difference with the /volatile:iso switch set when targeting x86 architectures. But that's a very loose guarantee, to say the least.

Note that this empirical observation is consistent with Alexander Gutenev's speculation that a difference in semantics is observed only on ARM, and that the whole reason these switches were introduced was to avoid paying a performance penalty on this newly-supported platform. Meanwhile, on the x86 side, there have been no actual changes to the semantics in generated code, since there is essentially no cost. (Save for some extremely trivial optimization possibilities, but that would require that their optimizer have two completely separate schedulers, which probably isn't a good use of developer time.)

The point is that, with /volatile:iso, MSVC is permitted to act like GCC and re-order the stores. With /volatile:ms, you are guaranteed that it won't because volatile implies acquire/release semantics for that variable.

Bonus Reading: So, what is volatile supposed to be used for, in strictly ISO-compliant code (i.e., when the /volatile:iso switch is used)? Well, volatile is basically meant for memory-mapped I/O. That's what it was originally meant for when it was first introduced, and it remains its principal purpose. I've heard it jokingly said that volatile is for reading/writing a tape drive. Basically, you mark the pointer volatile in order to prevent the compiler from optimizing reads and writes away. For example:

volatile char* pDeviceIOAddr = ...;

void Wait()
{
    while (*pDeviceIOAddr)
    { }
}

Qualifying the parameter's type with volatile prevents the compiler from assuming that subsequent reads return the same value, forcing it to do a new read each time through the loop. In other words:

  mov  eax, DWORD PTR [pDeviceIoAddr]  // get pointer
Wait:
  cmp  BYTE PTR [eax], 0               // dereference pointer, read 1 byte,
  jnz  Wait                            //  and compare to 0

If pDeviceIoAddr wasn't volatile, the entire loop could have been elided. Optimizers definitely do this in practice, including MSVC. Or, you could get the following pathological code:

  mov  eax, DWORD PTR [pDeviceIoAddr]  // get pointer
  mov  al, BYTE PTR [eax]              // dereference pointer, read 1 byte
Wait:
  cmp  al, 0                           // compare it to 0
  jnz  Wait

where the pointer is dereferenced once, outside of the loop, caching the byte in a register. The instruction at the top of the loop just tests that enregistered value, creating either no loop or an infinite loop. Oops.

Notice, however, that this use of volatile in ISO-standard C++ does not obviate the need for critical sections, mutexes, or other types of locks. Even the correct version of the above code wouldn't work correctly if another thread could potentially modify pDeviceIOAddr, since the read of that address/pointer does not have acquire semantics. Acquire semantics would look like this:

Wait:
  mov  eax, DWORD PTR [pDeviceIoAddr]  // get pointer (acquire semantics)
  cmp  BYTE PTR [eax], 0               // dereference pointer, read 1 byte,
  jnz  Wait                            //  and compare to 0

and to get that, you would need C++11's std::atomic.

Thebaid answered 23/6, 2017 at 15:22 Comment(0)

S

1

I suspect it might start matter from some point in time, if not yet already.

There's undocumented option /volatileMetadata- vs /volatileMetadata. When it is on (default), some metadata is generated for emulation of x86 on ARM.

The information is from my issue report, some links from there:

Visual Studio 2019 version 16.10 Preview 2 enabled volatile metadata by default when targeting x64 to improve emulation performance
ARM64EC Support in Visual Studio

So, at least hypothetically, /volatile:iso may matter for volatile metadata, and affect x86 code execution on ARM.

I have no confirmation that it happens. By compiling the same binary with and without /volatileMetadata- I at least confirm by binary size that the mentioned metadata does exist.

Schweinfurt answered 6/1, 2022 at 10:20 Comment(0)

Recommended topics

Hot tags