Is the order of writes to separate members of a volatile struct guaranteed to be preserved?

Asked 14/12, 2020 at 20:19 Answered 15/12, 2020 at 9:31

Solved c++c concurrency language-lawyer volatile

Suppose I have a struct like this:

volatile struct { int foo; int bar; } data;
data.foo = 1;
data.bar = 2;
data.foo = 3;
data.bar = 4;

Are the assignments all guaranteed not to be reordered?

For example without volatile, the compiler would clearly be allowed to optimize it as two instructions in a different order like this:

data.bar = 4;
data.foo = 3;

But with volatile, is the compiler required not to do something like this?

data.foo = 1;
data.foo = 3;
data.bar = 2;
data.bar = 4;

(Treating the members as separate unrelated volatile entities - and doing a reordering that I can imagine it might try to improve locality of reference in case foo and bar are at a page boundary - for example.)

Also, is the answer consistent for current versions of both C and C++ standards?

Chesney answered 14/12, 2020 at 20:19 Comment(8)

I don't know, but I sure hope so, else the queue structs I use for interrupt comms may be in trouble:) – Bield 14/12, 2020 at 20:26

Not reordered full quote here for C++ (C may be different) - en.cppreference.com/w/cpp/language/cv "an object whose type is volatile-qualified, or a subobject of a volatile object" ... _"Every access (read or write operation, member function call, etc.) made through a glvalue expression of volatile-qualified type is treated as a visible side-effect for the purposes of optimization " – Cockalorum 14/12, 2020 at 20:28

If this is about C++ and "concurrency" per se (as the tag says), check out std::atomic. It has similar non-reordering guarantees. – Baeda 14/12, 2020 at 20:40

@bloody: Unfortunately volatile std::atomic types have some counterintuitive behavior, and at least on current compilers. For instance here a load from a volatile std::atomic<int> is optimized out because its value is unused, even though it wouldn't be for a regular volatile int. – Commando 14/12, 2020 at 21:17

@NateEldredge I never thought about joining std::atomic with volatile. If op exposes that structure for IO interaction then utilizing volatile is unquestionable. However op's tag suggests it's about concurrency (multithreaded program) in which case std::atomic is the right tool to use and not volatile. Perhaps this is just a loose style of tag naming. – Baeda 14/12, 2020 at 21:43

@Baeda primarily I'm looking at C, but since there's often subtle differences between the languages (C++ seems to have long departed from the goal of being a superset) I'm curious about volatile in particular as it would apply to portability of C code to C++. Yes C++ indeed has much better libraries for dealing with this sort of thing. – Chesney 15/12, 2020 at 0:9

@NateEldredge That is required behaviour, it has to do with discarded-value expressions and what constitutes as reads. On the other hand, you shouldn't volatile std::atomic in the first place anyways. – Claar 15/12, 2020 at 9:21

The compiler is not obliged to do anything, what constitutes a volatile access is implementation-defined, the standard just defines a certain ordering relation on accesses in terms of observable behaviour & the abstract machine, for implementation documentation to refer to. Code generation is not addressed by the standard. – Abattoir 16/12, 2020 at 6:39

They will not be reordered.

C17 6.5.2.3(3) says:

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member, 97) and is an lvalue if the first expression is an lvalue. If the first expression has qualified type, the result has the so-qualified version of the type of the designated member.

Since data has volatile-qualified type, so do data.bar and data.foo. Thus you are performing two assignments to volatile int objects. And by 6.7.3 footnote 136,

Actions on objects so declared [as volatile] shall not be “optimized out” by an implementation or reordered except as permitted by the rules for evaluating expressions.

A more subtle question is whether the compiler could assign them both with a single instruction, e.g., if they are contiguous 32-bit values, could it use a 64-bit store to set both? I would think not, and at least GCC and Clang don't attempt to.

Commando answered 14/12, 2020 at 20:30 Comment(11)

Thanks for quoting the standard (as I I happen to not have a copy), That seems to answer the question, but your text "you are assigning two volatile int objects" is misleading in that if they were not considered the same object the answer would be different, or there would need to be an additional restriction on the compiler to preserve the order of accesses that are volatile even if they are in unrelated objects. Maybe best to keep the quote and refine the answer text... – Chesney 14/12, 2020 at 20:56

I think changing operations to be simultaneous (using one instruction for two assignments) ought to count as a reordering. If not by strict interpretation of the standard, then certainly by the spirit of the standard, the reason for such a restriction (which has a performance penalty) applies regardless of whether you get tricksy with the wording. – Walrus 14/12, 2020 at 21:4

@TedShaneyfelt: Rephrased to "two assignments to volatile int objects". – Commando 14/12, 2020 at 21:7

You mean two assignments to the same volatile int object? That would be satisfactory. – Chesney 14/12, 2020 at 21:8

Note that they are different parts of the same object, not separate volatile objects, but the same object, as the first quote of the specification that you gave pointed out... – Chesney 14/12, 2020 at 21:10

@Walrus has to be correct about simultaneous being reordering. Changing operations to be simultaneous would affect hardware, for example, setting data bits then toggling a strobe bit on memory mapped I/O is clearly something that would be optimized out if that were allowed. – Chesney 14/12, 2020 at 21:14

It is implementation-defined what constitutes access to a volatile-qualified object. If the C implementation targets hardware on which the effects of a 64-bit write could be the same as two 32-bit writes (e.g., two 32-bit writes might be seen separately by other components sharing memory, but they could be seen as indistinguishable, so a 64-bit write that is necessarily simultaneous is indistinguishable from two 32-bit writes that happen to be effectively simultaneous), then it could be reasonable for the implementation to define “access” so that a 64-bit write can be used. – Sunbow 14/12, 2020 at 21:14

@Eric Postpischil for that to be the case, they wouldn't truly be written simultaneously, even though they are optimized to a single instruction. Then it seems OK. But if they are distinguishable, which they would be if the strobe went active as the data were being written instead of afterward, then it would be incorrectly reordered to be simultaneous. The compiler would need to take into consideration whether or not the alignment is such that it could get away with a single write instruction being split into two data write accesses. – Chesney 14/12, 2020 at 21:23

@TedShaneyfelt: The members of a structure type are themselves objects. 6.2.5 (20): "A structure type describes a sequentially allocated nonempty set of member objects". So we are indeed performing two accesses to volatile objects, and they happen to be different objects, albeit they are also both part of the object data. I changed the wording to make it clear that reordering would still be forbidden even for two accesses to the same object (which is not the case at hand). – Commando 14/12, 2020 at 21:23

1. Yes, of course individual members of an object are themselves objects. 2. Yes, the standard in footnote 136 clearly does prohibit optimizing out accesses such as: data.bar=4; data.foo=3; would do. 3. The footnote could be interpreted as "Actions on [any of the] objects so declared [as volatile] shall not be “optimized out” by an implementation or reordered except as permitted by the rules for evaluating expressions. [but its relation to other such objects is not taken into account here], so the fact that they are part of the same object seems to be relevant. – Chesney 14/12, 2020 at 21:33

Regarding preserving order of operations... open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html – Chesney 14/12, 2020 at 21:40

If you want to use this in multiple threads, there is one significant gotcha.

While the compiler will not reorder the writes to volatile variables (as described in the answer by Nate Eldredge), there is one more point where write reordering can occur, and that is the CPU itself. This depends on the CPU architecture, and a few examples follow:

Intel 64

See Intel® 64 Architecture Memory Ordering White Paper.

While the store instructions themselves are not reordered (2.2):

Stores are not reordered with other stores.

They may be visible to different CPUs in a different order (2.4):

Intel 64 memory ordering allows stores by two processors to be seen in different orders by those two processors

AMD 64

AMD 64 (which is the common x64) has similar behaviour in the specification:

Generally, out-of-order writes are not allowed. Write instructions executed out of order cannot commit (write) their result to memory until all previous instructions have completed in program order. The processor can, however, hold the result of an out-of-order write instruction in a private buffer (not visible to software) until that result can be committed to memory.

PowerPC

I remember having to be careful about this on Xbox 360 which used a PowerPC CPU:

While the Xbox 360 CPU does not reorder instructions, it does rearrange write operations, which complete after the instructions themselves. This rearranging of writes is specifically allowed by the PowerPC memory model

To avoid CPU reordering in a portable way you need to use memory fences like C++11 std::atomic_thread_fence or C11 atomic_thread_fence. Without them, the order of writes as seen from another thread may be different.

This is also noted in the Wikipedia Memory barrier article:

Moreover, it is not guaranteed that volatile reads and writes will be seen in the same order by other processors or cores due to caching, cache coherence protocol and relaxed memory ordering, meaning volatile variables alone may not even work as inter-thread flags or mutexes.

Fite answered 15/12, 2020 at 9:31 Comment(3)

"This raises the issue of whether volatile should be given a real meaning that provides both atomicity and inter-thread visibility, roughly along the lines of Java volatiles. Although we believe that abstractly this provides a substantial improvement by giving semantics to something that currently has almost no portable semantics, there seem to be a number of practical obstacles driven by backward compatibility issues that lead us to at least hesitate." - Hans Boehm & Nick Maclaren open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html ... – Chesney 15/12, 2020 at 19:4

Boehm & Maclaren's concern could have been addressed perhaps by the standards committee adding a syntactic construct within which volatiles would be forced to behave more along the spirit of volatility that they hesitate to require for reason s of backward compatibility. e.g. s new syntax: volatile { block } would be a sufficient addition to the language to allow backwards compatibility, but to also allow a more intuitive meaningful and useful behavior of volatile objects within that block. Like namespace, it might be best to allow it to span multiple function definitions. As it is it's goofy. – Chesney 15/12, 2020 at 19:17

If you're using this from multiple threads, you have a data race and all bets are off. Unlike atomic types, volatile objects are not thread safe and do not avoid data races. About the only viable use for volatile these days is to access memory-mapped hardware devices, and in that case you will normally have the memory marked as "uncached" in some machine-specific fashion, which is supposed to inhibit CPU reordering and ensure that the device sees loads and stores in (assembly-level) program order. – Commando 15/3, 2022 at 22:25

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Intel 64

AMD 64

PowerPC

Recommended topics

Hot tags