Is a C compiler allowed to coalesce sequential assignments to volatile variables?

Asked 20/5, 2021 at 7:51 Answered 21/5, 2021 at 22:24

Solved c language-lawyer compiler-optimization volatile

I'm having a theoretical (non-deterministic, hard to test, never happened in practice) hardware issue reported by hardware vendor where double-word write to certain memory ranges may corrupt any future bus transfers.

While I don't have any double-word writes explicitly in C code, I'm worried the compiler is allowed (in current or future implementations) to coalesce multiple adjacent word assignments into a single double-word assignment.

The compiler is not allowed to reorder assignments of volatiles, but it is unclear (to me) whether coalescing counts as reordering. My gut says it is, but I've been corrected by language lawyers before!

Example:

typedef struct
{
   volatile unsigned reg0;
   volatile unsigned reg1;
} Module;

volatile Module* module = (volatile Module*)0xFF000000u;

// two word stores, or one double-word store?
module->reg0 = 1;
module->reg1 = 2;

(I'll ask my compiler vendor about this separately, but I'm curious what the canonical/community interpretation of the standard is.)

Theocracy answered 20/5, 2021 at 7:51 Comment(7)

#52187334 – Kaka 20/5, 2021 at 8:13

Have you checked the assembly generated by the compiler to see whether it is doing this? – Monet 20/5, 2021 at 8:57

If the memory is mapped as "cacheable" or "write-combinable" then it could be the MMU combining the two single-word writes into a double-word write. – Gloss 20/5, 2021 at 9:14

@EricPostpischil Working on it. Making scripts to filter out possible occurrences. Project build system is resisting :-( – Theocracy 20/5, 2021 at 10:49

@Kaka Now looking like it does in vendor API. – Theocracy 20/5, 2021 at 10:53

@IanAbbott Hardware vendor report explicitly states a list of instructions. Might be relevant for someone else though. Good point. – Theocracy 20/5, 2021 at 10:54

@EricPostpischil Checked. Compiler does not coalesce such writes. Although, I have not considered -lto... The project toolchain and build system is not very supportive for getting assembler after linking, so not gonna test any further. Will trust in community consensus and vendor support channels that it won't happen. – Theocracy 21/5, 2021 at 9:4

The behavior of volatile seems to be up to the implementation, partly because of a curious sentence which says: "What constitutes an access to an object that has volatile-qualified type is implementation-defined".

In ISO C 99, section 5.1.2.3, there is also:

3 In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

So although requirements are given that a volatile object must be treated in accordance with the abstract semantics (i.e not optimized), curiously, the abstract semantics itself allows for the elimination of dead code and data flows, which are examples of optimizations!

I'm afraid that to know what volatile will and will not do, you have to go by your compiler's documentation.

Dishonesty answered 21/5, 2021 at 22:24 Comment(2)

Made me look deeper into vendor docs. Found this in section describing implementation defined behavior: "What constitutes an access to an object that has volatile-qualified type (6.7.3)." - Any reference to an object with volatile type results in an access. The order in which volatile objects are accessed is defined by the order expressed in the source code. References to non-volatile objects are scheduled in arbitrary order, within the constraints given by dependencies. (Followed by a paragraph on how passing a flag to the compiler makes any volatile access a memory barrier!) – Theocracy 25/5, 2021 at 7:51

@Theocracy About the original question: if you have a program which depends on writes to that structure not being coalesced, it is unlikely you are working in the realm of portable ISO C. On the other hand, there are a couple of required uses of volatile in strictly conforming programs: use of volatile sig_atomic_t in an asynchronous signal handler,and using volatile on the local variables of a function which are modified between context being saved with setjmp and restored with longjmp. As far as standard C is concerned, we can regard volatile as existing for those situations. – Dishonesty 25/5, 2021 at 8:24

No, the compiler is absolutely not allowed to optimize those two writes into a single double word write. It's kind of hard to quote the standard since the part regarding optimizations and side effects is so fuzzily written. The relevant parts are found in C17 5.1.2.3:

The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant.

Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.

In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.

When you access part of a struct, that in itself is a side-effect, which may have consequences that the compiler can't determine. Suppose for example that your struct is a hardware register map and those registers need to be written in a certain order. Like for example some microcontroller documentation could be along the lines of: "reg0 enables the hardware peripheral and must be written to before you can configure the details in reg1".

A compiler that would merge the volatile object writes into a single one would be non-conforming and plain broken.

Kaka answered 20/5, 2021 at 8:1 Comment(3)

Ohhh didn't think of the struct access. The pointer in this case should not be volatile then, leaving only the members volatile (and down the nested volatile rabbit hole we go). Damn, C is hard. Happy to see you were able to look past that. The "real" code in question does not have that aspect, but it was too gnarly to make a good example from. – Theocracy 20/5, 2021 at 11:10

@Theocracy If the struct access is volatile, then member access will be volatile even if the members are not declared volatile. Same as for "const". – Gloss 20/5, 2021 at 12:14

This is wrong, though a very very common misconcpeption. The standard maps program texts to sequences of observable actions of the abstract machine. It does not say how those are reflected in reality. Moreover it explicitly says that what constitutes a volatile access, what gets to be a volatile externally observed action, is implementation-defined. The standard says nothing about object code. – Tribasic 21/5, 2021 at 23:56

The compiler is not allowed to make two such assignments into a single memory write. There must be two independent writes from the core. The answer from @Lundin gives relevant references to the C standard.

However, be aware that a cache - if present - may trick you. The keyword volatile doesn't imply "uncached" memory. So besides using volatile, you also need to make sure that the address 0xFF000000 is mapped as uncached. If the address is mapped as cached, the cache HW may turn the two assignments into a single memory write. In other words - for cached memory two core memory write operations may end up as a single write operation on the systems memory interface.

Valenti answered 20/5, 2021 at 7:51 Comment(29)

volatile absolutely means uncached memory. A system that does pre-fetch reads of volatile qualified variables is not compliant. volatile access has to be performed according to the sequence points placed around the variables. As CPUs have evolved, there's been attempts by hardware and/or compiler vendors to push this burden of memory barrier-like behavior onto the application programmers. But C has never allowed speculative or out of order execution of volatile access. It's not the application programmer's fault if someone has released hardware which can't execute compliant C. – Kaka 20/5, 2021 at 10:16

@Kaka I like to see some reference for that claim as I disagree. Also this little example ideone.com/U8Sq9n shows that the compiler doesn't map volatile variables any different than ordinary variables. – Valenti 20/5, 2021 at 10:50

Arguments and references here: https://mcmap.net/q/18208/-does-quot-volatile-quot-guarantee-anything-at-all-in-portable-c-code-for-multi-core-systems. Obviously you can't use some compiler output as proof of anything, since the very problem is that the compilers don't care to implement volatile as a memory barrier, since they can only do so much about the underlying hardware. – Kaka 20/5, 2021 at 10:58

I've seen something in the datasheet where multiple double-word initiates bigger burst/block transfer (multi-double-word?) over the bus. That would be an out of scope aspect though since hw vendor report states "these instructions sometimes won't work for writing to these addresses". Did I get that right or did you mean something else? – Theocracy 20/5, 2021 at 11:14

@Theocracy HW implementations uses a lot of optimizations that we don't see. One such optimization is to do both reads and writes in bigger trunks than requested by the processor core. It happens all the time. For cacheable data it's not a problem. As programmers we normally don't care and we don't need to. The exception is when we write to (registers) in other HW devices. Here we need to be sure that all reads and writes happens in the order and with the exact size that our code say. One way of ensuring that is to make sure that the address space of that external device is mapped as uncached. – Valenti 20/5, 2021 at 11:33

@Kaka So your point is that (many) modern systems/compilers violates the C standard because they doesn't ensure that a variable defined using volatile will actually be written to memory - is that your point? – Valenti 20/5, 2021 at 11:37

@Kaka Re: "Obviously you can't use some compiler output as proof of anything.." No, and that wasn't the intention. The intention was just to show an example of one compiler/system that didn't map volatile variables any different from other variables. – Valenti 20/5, 2021 at 11:39

@4386427 They violate the C standard because the point where the volatile variable is accessed isn't where "the semantics" specify it, in the form of sequence points. The access may happen between any two sequence points but not outside them. And naturally this matters a lot in case of hardware registers or DMA buffers etc. If some compiler/system goes bananas and pre-fetch such variables into data cache, then it is not only non-compliant, it is also broken and useless. – Kaka 20/5, 2021 at 11:41

@Kaka hmm... I don't think I want to go into a discussion about whether the whole industry is building systems that ain't C standard compliant. I'm not sure I have the time. Can we at least agree that a) compilers are not allowed to optimize two writes to volatile variables into a single write and b) that mainstream compilers doesn't ensure that volatile variables are mapped to uncached memory areas ? – Valenti 20/5, 2021 at 12:1

a) yes b) depends on what you mean with mainstream. x86, then yeah all bets are probably off. ARM or PowerPC... then it depends on the core and the compiler. – Kaka 20/5, 2021 at 12:4

@Kaka okay, I settle with that. I have been using quite a few ARM and PowerPC in embedded systems. I don't recall any compiler doing anything special for volatiles but then again - I haven't tried all cores/compilers so I'm not going to claim that it's never handled by the compiler. – Valenti 20/5, 2021 at 12:12

@Theocracy To answer your concern differently. If you use the vendors "faulty" ranges only as uncached areas and you on top of that declare your variables as volatile, you should be safe as long as your code doesn't explicit do a double write. Nothing in the system will change single writes to double writes. And.. If the ranges are used for memory mapped HW devices (as your question suggest), you'll have to do the first two things anyway. – Valenti 20/5, 2021 at 12:23

@Lundin: C has never allowed speculative or OoO execution of volatile access - that's different from "uncacheable". You seem to be talking about not hoisting loads/sinking stores out of loops in asm. But that's totally different from hardware prefetch on write-back cacheable memory regions. You can look at it as C guaranteeing that loads/stores to the cache coherency domain are a visible side-effect, not the true contents of DRAM. SW can't observe DRAM (except possibly via another mapping of the same physical address, or on a hypothetical system with non-coherent shared memory) – Belay 20/5, 2021 at 17:17

@Lundin: If you want MMIO accesses to work properly, you need to make sure the address range including the MMIO address is mapped uncacheable even if you're writing asm by hand; it's implausible and impractical for a C compiler to do this for you for global volatile int foo;. – Belay 20/5, 2021 at 17:19

@PeterCordes But that's still the whole point: they've made hardware that's incompatible with the C language. There's not much compilers can do about that. Though I suppose tool chains could in theory create uncacheable memory segments and allocate volatile variables there. Though in case of registers they have to be at very specific addresses, naturally, so a non-standard extension is required. Here C could have standardized something, like the commonly used standard extension @ 0x1234 operator for allocating something at a specific address. – Kaka 21/5, 2021 at 6:31

@Lundin: You seem to have decided that DRAM itself, not the cache-coherent view of memory that all cores share, is what the C standard means by "the execution environment". Yes, your argument would follow from that premise. But I don't see a good reason to choose that, and it makes very little sense to me in a C implementation for a system with coherent cache. Bypassing cache would make volatile unusably slow overkill for a lot of things, and make users look for some mechanism that wasn't horrible. e.g. for stuff like volatile sig_atomic_t, for making sure stores to mmaped files happen. – Belay 21/5, 2021 at 6:56

@Lundin: The current de-facto agreement on what volatile means is quite useful, and in fact for years (before C11), and still in some code, was used successfully for inter-thread communication, even before the language had a formal memory model. (Thanks to more de-facto standard behaviour in that case). The use cases for wanting a volatile that truly bypassed cache are extremely small. I can see some merit for volatile imply sequential consistency of all volatile accesses, blocking runtime reordering of accesses (to at least cache). – Belay 21/5, 2021 at 7:1

@PeterCordes It's because timing matters. When you declare something volatile you want it accessed at the point when the code accessing the variable is executed. In theory I could declare a variable volatile and assume that a slow fetch from RAM will happen as part of my timing calculation. Not an issue on x86 perhaps since they are rarely used for real-time systems. But in embedded real-time systems the timing might matter a lot. – Kaka 21/5, 2021 at 7:3

@Lundin: (cont. from previous comment.) In ISO C11, volatile doesn't bypass/avoid data-race UB the way _Atomic does, although one can argue that's only because ISO C11 doesn't require coherent caches except for performance of release / acquire. But unless you want to argue that volatile de-facto is thread-safe with whatever semantics it has, ISO C11 chose not to give volatile and inter-thread semantics. – Belay 21/5, 2021 at 7:3

@Lundin: Linkers, and software to control memory-type attributes like making some range uncacheable, give you the tools to set up some uncacheable memory you can read from if that's what you want, when programming for a system that does have cache. I don't buy that timing argument at all. If you want something extra slow for a delay, do a volatile read from uncacheable memory, not just from any arbitrary variable. Having every volatile necessarily be slow sounds like a worse design that I wouldn't want. – Belay 21/5, 2021 at 7:7

@Kaka You can qualify automatic variables as volatile. Does that mean then that the compiler has to emit code to turn off caching for that section of the stack? Never seen that before and seems absurd. (an automatic variable qualified volatile is quite useful e.g. if you single-step through the program and want to change it from a debugger). – Curse 21/5, 2021 at 12:21

@PeterCordes: Right, nobody would "argue that volatile de-facto is thread-safe with whatever semantics it has" -- even on x86 with its Total Store Order memory model (stores by one thread will be seen in that order by all other threads/cores) volatile isn't enough to ensure thread-safe code. For those interested in the entertaining gory details of how cache coherency works (or fails to work without proper barriers) see Paul McKenney's freely available book arxiv.org/abs/1701.00854, Appendix C, sections C.3.2 Store Forwarding and C.3.3 Store Buffers and Memory Barriers – Prophets 21/5, 2021 at 14:4

@PeterCordes: Would anything in the C Standard forbid an implementation from e.g. processing volatile reads and writes as calls to a function which was documented as e.g. using normal hardware reds and writes for all addresses other than 0x12340000 and 0x12340001 but would latch bytes writes to 0x12340000 without forwarding them to the hardware, and convert byte writes of 0x12340001 into word writes that would bundle the last value code had written to 0x12340000? – Prudery 21/5, 2021 at 15:50

@amdn: When volatile was added to the language, I think it was expected to serve as a "catch-all" with the tightest semantics an application might need. Implementations where that would pose an unacceptable performance burden could extend the language with ways of requesting weaker semantics, but I think the keyword was intended to allow programs to ensure correct behavior--even if not optimal performance--without use of compiler-specific extensions. – Prudery 21/5, 2021 at 15:53

@fuz: I wouldn't expect a compiler to turn off caching, but an implementation intended for low-level programming should allow a programmer who is able to do so to exploit the resulting semantics. – Prudery 21/5, 2021 at 15:57

(Correcting an earlier comment: I suggested that another mapping of the same physical page could see actual DRAM. That's true, it could be uncacheable, and since caching/coherency is based on physical address, a read could force any dirty copies to write back first. I don't think most ISAs allow another mapping to bypass cache coherency and read (or write) DRAM while this or another core has a dirty copy of a line (or any valid for write). Some ISAs may not have cache-coherent DMA, although x86 always has since the first x86s with caches, for backward compat with existing OSes as always) – Belay 21/5, 2021 at 16:45

@supercat: If any type is wider than char, a better design would be to just call a helper for volatile stores of those types if it needs combining. If volatile char stores would have some physical meaning (but "wrong" for driving that hardware), and volatile short or volatile int stores are also possible for at least that address, then I think the wording of volatile would require an implementation to let applications shoot themselves in the foot. But if there's no other way, yeah it might be justifiable to play fast and loose with the meaning of "execution environment". – Belay 21/5, 2021 at 16:54

@PeterCordes: An implementation that performs such virtualization may be able to take code which was written for one hardware platform and generate machine code that will work on another, while allowing most non-machine-specific parts of the code to run at full speed. Virtualized I/O operations would obviously be much slower on the newer platform than direct I/O would have been, but if most of a program's time would be spent on things other than I/O, that might not matter too much. – Prudery 21/5, 2021 at 17:3

@PeterCordes: Incidentally, it may be worth noting that any read or write of a hardware register that does not store the value written in such a way as to be an "object" invokes Undefined Behavior, whether or not a volatile qualifier is specified. Compilers intended for low-level programming shouldn't use this as an excuse to behave uselessly, but may use it to justify any deviation from normal behavior that could be useful. – Prudery 21/5, 2021 at 17:16

In ISO C 99, section 5.1.2.3, there is also:

3 In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

I'm afraid that to know what volatile will and will not do, you have to go by your compiler's documentation.

Dishonesty answered 21/5, 2021 at 22:24 Comment(2)

The C Standard is agnostic to any relationship between operations on volatile objects and operations on the actual machine. While most implementations would specify that a construct like *(char volatile*)0x1234 = 0x56; would generate a byte store with value 0x56 to hardware address 0x1234, an implementation could, at its leisure, allocate space for e.g. an 8192-byte array and specify that *(char volatile*)0x1234 = 0x56; would immediately store 0x56 to element 0x1234 of that array, without ever doing anything with hardware address 0x1234. Alternatively, an implementation may include some process that periodically stores whatever happens to be in 0x1234 of that array to hardware address 0x56.

All that is required for conformance is that all operations on volatile objects within a single thread are, from the standpoint of the Abstract machine, regarded as absolutely sequenced. From the point of view of the Standard, implementations can convert such accesses into real machine operations in whatever fashion they see fit.

Prudery answered 20/5, 2021 at 20:41 Comment(2)

Moreover what constitutes a volatile access is implementation-defined. – Tribasic 22/5, 2021 at 0:0

@philipxy: Indeed so. Commercial compilers would generally treat a volatile write as forcing a compiler to effectively flush all "register cached" objects, allowing code for things like background I/O that were written for any such compiler on a particular platform to work with any other vendor's compiler that used similar semantics. Clang and gcc, however, refuse to support such semantics since they view such code as "broken". – Prudery 23/5, 2021 at 6:3

Changing it will change the observable behavior of the program. So compiler is not allowed to do so.

Landward answered 20/5, 2021 at 8:24 Comment(1)

The sequence of actual hardware memory operations is only "observable" if an implementation chooses to specify it as such. Nothing would forbid an implementation from include its own virtual machine where volatile stores update the virtual machine state immediately, but such updates take awhile to be translated into operations on real machine hardware. – Prudery 20/5, 2021 at 20:44

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags