Is it allowed for a compiler to optimize away a local volatile variable?

Asked 23/7, 2018 at 6:1 Answered 24/7, 2018 at 0:9

Is the compiler allowed to optimize this (according to the C++17 standard):

int fn() {
    volatile int x = 0;
    return x;
}

to this?

int fn() {
    return 0;
}

If yes, why? If not, why not?

Here's some thinking about this subject: current compilers compile fn() as a local variable put on the stack, then return it. For example, on x86-64, gcc creates this:

mov    DWORD PTR [rsp-0x4],0x0 // this is x
mov    eax,DWORD PTR [rsp-0x4] // eax is the return register
ret

Now, as far as I know the standard doesn't say that a local volatile variable should be put on the stack. So, this version would be equally good:

mov    edx,0x0 // this is x
mov    eax,edx // eax is the return
ret

Here, edx stores x. But now, why stop here? As edx and eax are both zero, we could just say:

xor    eax,eax // eax is the return, and x as well
ret

And we transformed fn() to the optimized version. Is this transformation valid? If not, which step is invalid?

Laughry answered 23/7, 2018 at 6:1 Comment(14)

Comments are not for extended discussion; this conversation has been moved to chat. – Kong 24/7, 2018 at 0:41

Related: MCU programming - C++ O2 optimization breaks while loop – Laky 24/7, 2018 at 5:21

@philipxy: It is not about "what could produce". It is about whether the transformation is allowed. Because, if it is not allowed, then it must not produce the transformed version. – Laughry 24/7, 2018 at 8:51

The standard defines for a program a sequence of accesses to volatiles & other observables that an implementation must respect. But what access to a volatile means is implementation-defined. So it is pointless to ask what an implementation could produce--it produces what it is defined to produce. Given some description of implementation behaviour, you might seek another that you prefer. But you need one to start. Maybe you are actually interested in the standard's observable rules since code generation is irrelevant other than having to satisfy the rules of the standard & an implementation. – Connolly 24/7, 2018 at 8:52

@philipxy: As I understand you, you say that the standard doesn't forbid this transformation, i.e. it is allowed? – Laughry 24/7, 2018 at 8:53

(See my updated first comment.) I have no idea what you are trying to say. Your question & comment to not frame the question of code generation in a meaningful way. The standard maps a program to a sequence of observables. If a target architecture, say, ignores volatile accesses, say, because it has no relevant hardware accesses, then it can just ignore the keyword. – Connolly 24/7, 2018 at 8:54

@philipxy: It is simple. Is the compiler allowed to emit a simple xor eax, eax; ret for fn() (which contains the volatile variable), or not? Is it allowed by the standard, or not? It is a yes-no question. If you think that it is allowed, then please write an answer, because the current most upvoted answer tells otherwise, as far as I understand it. – Laughry 24/7, 2018 at 8:57

What a volatile access is implementation defined. How does your implementation define it? (You are not framing the situation correctly, see my first comment. The standard has nothing to say about code generation, it defines a sequence of observables.) – Connolly 24/7, 2018 at 8:58

@philipxy: If you say this is implementation defined, then the answer is "The standard allows it, it is up to the implementation to define this whether it is allowed or not.". – Laughry 24/7, 2018 at 9:0

@philipxy: I'll clarify my question that it is about the standard. It is usually implied by these kind of questions. I'm interested in what the standard says. – Laughry 24/7, 2018 at 9:2

PS All the answers so far erroneously ignore or mininterpret that what is a volatile access is implementation defined. It does not make sense to say, the implementation might not know what the consequences of an access are. (Learn about how program semantics are defined via the abstract machine & the as-if rule.) – Connolly 24/7, 2018 at 9:20

@Connolly How does the so called "as-if rule" apply to volatile? – Racing 24/7, 2018 at 22:5

@Racing I would expect any explanation of the as-if rule to include that. – Connolly 27/7, 2018 at 0:32

You might be interested in Can compiler sometimes cache variable declared as volatile. Although the question mentions threads. (So the answers do too.) Although they have nothing to do with volalite. – Connolly 29/7, 2018 at 8:33

No. Access to volatile objects is considered observable behavior, exactly as I/O, with no particular distinction between locals and globals.

The least requirements on a conforming implementation are:

Access to volatile objects are evaluated strictly according to the rules of the abstract machine.

[...]

These collectively are referred to as the observable behavior of the program.

N3690, [intro.execution], ¶8

How exactly this is observable is outside the scope of the standard, and falls straightly into implementation-specific territory, exactly as I/O and access to global volatile objects. volatile means "you think you know everything going on here, but it's not like that; trust me and do this stuff without being too smart, because I'm in your program doing my secret stuff with your bytes". This is actually explained at [dcl.type.cv] ¶7:

[ Note: volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation. Furthermore, for some implementations, volatile might indicate that special hardware instructions are required to access the object. See 1.9 for detailed semantics. In general, the semantics of volatile are intended to be the same in C++ as they are in C. — end note ]

Waistcloth answered 23/7, 2018 at 6:24 Comment(2)

Since this is the most upvoted question, and the question got expanded by edit, it would be nice to have this answer edited to discuss the new optimization examples. – Manciple 23/7, 2018 at 20:51

Correct is "yes". This answer does not clearly distinguish abstract machine observables from generated code. The latter is implementation-defined. Eg perhaps for use with a given debugger a volatile object is guaranteed to be in memory and/or register; eg typically under a relevant target architecture writes and/or reads for volatile objects at pragma-specified special memory locations are guaranteed. The implementation defines how accesses are reflected in code; it decides how & when object(s) "might be changed by means undetectable by an implementation". (See my comments on the question.) – Connolly 27/7, 2018 at 1:7

This loop can be optimised away by the as-if rule because it has no observable behaviour:

for (unsigned i = 0; i < n; ++i) { bool looped = true; }

This one cannot:

for (unsigned i = 0; i < n; ++i) { volatile bool looped = true; }

The second loop does something on every iteration, which means the loop takes O(n) time. I have no idea what the constant is, but I can measure it and then I have a way of busy looping for a (more or less) known amount of time.

I can do that because the standard says that access to volatiles must happen, in order. If a compiler were to decide that in this case the standard didn't apply, I think I would have the right to file a bug report.

If the compiler chooses to put looped into a register, I suppose I have no good argument against that. But it still must set the value of that register to 1 for every loop iteration.

Sadick answered 23/7, 2018 at 19:38 Comment(6)

So, are you saying the final xor ax, ax (where ax is considered to be volatile x) version in the question is valid, or invalid? IOW, what is your answer to the question? – Manciple 23/7, 2018 at 20:40

@hyde: The question, as I read it, was "can the variable be eliminated" and my answer is "No". For the specific x86 implementation which raises the question of whether the volatile can be placed in a register, I'm not entirely sure. Even if it is reduced to xor ax, ax, though, that opcode cannot be eliminated even if it looks useless, and nor can it be merged. In my loop example, the compiled code would have to execute xor ax, ax n times in order to satisfy the observable behaviour rule. Hopefully the edit answers your question. – Sadick 23/7, 2018 at 20:48

Yeah, the question got expanded quite a bit by the edit, but since you answered after the edit, I thought this answer should cover the new part... – Manciple 23/7, 2018 at 20:53

@hyde: In fact, I do use volatiles in that way in benchmarks in order to avoid having the compiler optimise away a loop which otherwise does nothing. So I really hope I'm right about this :=) – Sadick 23/7, 2018 at 20:58

The Standard says that operations on volatile objects are--in and of themselves--a kind of side effect. An implementation could define their semantics in a way that would not require them to generate any actual CPU instructions, but a loop which accesses a volatile-qualified object has side effects and is thus not eligible for elision. – Nilgai 26/7, 2018 at 20:5

But also the standard says "The semantics of an access through a volatile glvalue are implementation-defined." – Connolly 23/10, 2019 at 9:25

I beg to dissent with the majority opinion, despite the full understanding that volatile means observable I/O.

If you have this code:

{
    volatile int x;
    x = 0;
}

I believe the compiler can optimize it away under the as-if rule, assuming that:

The volatile variable is not otherwise made visible externally via e.g. pointers (which is obviously not a problem here since there is no such thing in the given scope)
The compiler does not provide you with a mechanism for externally accessing that volatile

The rationale is simply that you couldn't observe the difference anyway, due to criterion #2.

However, in your compiler, criterion #2 may not be satisfied! The compiler may try to provide you with extra guarantees about observing volatile variables from the "outside", such as by analyzing the stack. In such situations, the behavior really is observable, so it cannot be optimized away.

Now the question is, is the following code any different than the above?

{
    volatile int x = 0;
}

I believe I've observed different behavior for this in Visual C++ with respect to optimization, but I'm not entirely sure on what grounds. It may be that initialization does not count as "access"? I'm not sure. This may be worth a separate question if you're interested, but otherwise I believe the answer is as I explained above.

Burgos answered 24/7, 2018 at 0:9 Comment(0)

I'm just going to add a detailed reference for the as-if rule and the volatile keyword. (At the bottom of these pages, follow the "see also" and "References" to trace back to the original specs, but I find cppreference.com much easier to read/understand.)

Particularly, I want you to read this section

volatile object - an object whose type is volatile-qualified, or a subobject of a volatile object, or a mutable subobject of a const-volatile object. Every access (read or write operation, member function call, etc.) made through a glvalue expression of volatile-qualified type is treated as a visible side-effect for the purposes of optimization (that is, within a single thread of execution, volatile accesses cannot be optimized out or reordered with another visible side effect that is sequenced-before or sequenced-after the volatile access. This makes volatile objects suitable for communication with a signal handler, but not with another thread of execution, see std::memory_order). Any attempt to refer to a volatile object through a non-volatile glvalue (e.g. through a reference or pointer to non-volatile type) results in undefined behavior.

So the volatile keyword specifically is about disabling the compiler optimization on glvalues. The only thing here the volatile keyword can affect is possibly return x, the compiler can do whatever it wants with the rest of the function.

How much the compiler can optimize the return depends on how much the compiler is allowed to optimize the access of x in this case (since it isn't reordering anything, and strictly speaking, isn't removing the return expression. There is the access, but it is reading and writing to the stack, which is should be able to streamline.) So as I read it, this is a grey area in how much the compiler is allowed to optimize, and can easily be argued both ways.

Side note: In these cases, always assume the compiler will do the opposite of what you wanted/needed. You should either disable optimization (at least for this module), or try to find a more defined behavior for what you want. (This is also why unit testing is so important) If you believe it is a defect, you should bring it up with the developers of C++.

This all is still really hard to read, so trying to include what I think is relevant so that you can read it yourself.

glvalue A glvalue expression is either lvalue or xvalue.

Properties:

A glvalue may be implicitly converted to a prvalue with lvalue-to-rvalue, array-to-pointer, or function-to-pointer implicit conversion. A glvalue may be polymorphic: the dynamic type of the object it identifies is not necessarily the static type of the expression. A glvalue can have incomplete type, where permitted by the expression.

xvalue The following expressions are xvalue expressions:

a function call or an overloaded operator expression, whose return type is rvalue reference to object, such as std::move(x); a[n], the built-in subscript expression, where one operand is an array rvalue ; a.m, the member of object expression, where a is an rvalue and m is a non-static data member of non-reference type; a.*mp, the pointer to member of object expression, where a is an rvalue and mp is a pointer to data member; a ? b : c, the ternary conditional expression for some b and c (see definition for detail); a cast expression to rvalue reference to object type, such as static_cast(x); any expression that designates a temporary object, after temporary materialization. (since C++17) Properties:

Same as rvalue (below). Same as glvalue (below). In particular, like all rvalues, xvalues bind to rvalue references, and like all glvalues, xvalues may be polymorphic, and non-class xvalues may be cv-qualified.

lvalue The following expressions are lvalue expressions:

the name of a variable, a function, or a data member, regardless of type, such as std::cin or std::endl. Even if the variable's type is rvalue reference, the expression consisting of its name is an lvalue expression; a function call or an overloaded operator expression, whose return type is lvalue reference, such as std::getline(std::cin, str), std::cout << 1, str1 = str2, or ++it; a = b, a += b, a %= b, and all other built-in assignment and compound assignment expressions; ++a and --a, the built-in pre-increment and pre-decrement expressions; *p, the built-in indirection expression; a[n] and p[n], the built-in subscript expressions, except where a is an array rvalue (since C++11); a.m, the member of object expression, except where m is a member enumerator or a non-static member function, or where a is an rvalue and m is a non-static data member of non-reference type; p->m, the built-in member of pointer expression, except where m is a member enumerator or a non-static member function; a.*mp, the pointer to member of object expression, where a is an lvalue and mp is a pointer to data member; p->*mp, the built-in pointer to member of pointer expression, where mp is a pointer to data member; a, b, the built-in comma expression, where b is an lvalue; a ? b : c, the ternary conditional expression for some b and c (e.g., when both are lvalues of the same type, but see definition for detail); a string literal, such as "Hello, world!"; a cast expression to lvalue reference type, such as static_cast(x); a function call or an overloaded operator expression, whose return type is rvalue reference to function; a cast expression to rvalue reference to function type, such as static_cast(x). (since C++11) Properties:

Same as glvalue (below). Address of an lvalue may be taken: &++i1 and &std::endl are valid expressions. A modifiable lvalue may be used as the left-hand operand of the built-in assignment and compound assignment operators. An lvalue may be used to initialize an lvalue reference; this associates a new name with the object identified by the expression.

as-if rule

The C++ compiler is permitted to perform any changes to the program as long as the following remains true:

1) At every sequence point, the values of all volatile objects are stable (previous evaluations are complete, new evaluations not started) (until C++11) 1) Accesses (reads and writes) to volatile objects occur strictly according to the semantics of the expressions in which they occur. In particular, they are not reordered with respect to other volatile accesses on the same thread. (since C++11) 2) At program termination, data written to files is exactly as if the program was executed as written. 3) Prompting text which is sent to interactive devices will be shown before the program waits for input. 4) If the ISO C pragma #pragma STDC FENV_ACCESS is supported and is set to ON, the changes to the floating-point environment (floating-point exceptions and rounding modes) are guaranteed to be observed by the floating-point arithmetic operators and function calls as if executed as written, except that the result of any floating-point expression other than cast and assignment may have range and precision of a floating-point type different from the type of the expression (see FLT_EVAL_METHOD) notwithstanding the above, intermediate results of any floating-point expression may be calculated as if to infinite range and precision (unless #pragma STDC FP_CONTRACT is OFF)

If you want to read the specs, I believe these are the ones you need to read

References

C11 standard (ISO/IEC 9899:2011): 6.7.3 Type qualifiers (p: 121-123)

C99 standard (ISO/IEC 9899:1999): 6.7.3 Type qualifiers (p: 108-110)

C89/C90 standard (ISO/IEC 9899:1990): 3.5.3 Type qualifiers

Anile answered 23/7, 2018 at 20:59 Comment(8)

It may not be right according to the standard, but anyone relying on the stack to be touched by something else during execution should stop coding. I'd argue it's a standard defect. – Melody 24/7, 2018 at 4:13

@meneldal: That's way too broad a claim. Using _AddressOfReturnAddress involves analyzing the stack, for example. People analyze the stack for valid reasons, and it isn't necessarily because the function itself relies on it for correctness. – Burgos 24/7, 2018 at 4:53

glvalue is here: return x; – Laughry 24/7, 2018 at 9:26

@Laughry Sorry, this is all hard to read. Is that a glvalue because x is a variable? Also, for "can't be optimized out", does that mean the compiler can't optimize at all, or that it can't optimize by changing the expression? (It reads like the compiler is still allowed to optimize here because their is no access order to maintain, and the expression is still getting resolved, just in a more optimized way) I can see it being argued both ways without a higher understanding of the specs. – Anile 24/7, 2018 at 13:28

Here's a quote from your own answer :) "The following expressions are lvalue expressions: the name of a variable ..." – Laughry 24/7, 2018 at 13:35

@Laughry I got so caught up in the examples given, I missed that bit in my first read through. I updated my answer to support both yes and no because now I'm not sure. return x is minimal enough that I'm not sure if the compiler is actually violating anything by optimizing it. Depending on how strictly I read "optimize out". – Anile 24/7, 2018 at 13:48

@Mehrdad they are maybe defensible uses, but writing on the stack that doesn't belong to you is frowned upon, and in this case expecting anything to potentially happen between the two instructions is asking for trouble. – Melody 25/7, 2018 at 3:4

But also the standard says "The semantics of an access through a volatile glvalue are implementation-defined." – Connolly 23/10, 2019 at 9:26

Theoretically, an interrupt handler could

check if the return address falls within the fn() function. It might access the symbol table or source line numbers via instrumentation or attached debug information.
then change the value of x, which would be stored at a predictable offset from the stack pointer.

… thus making fn() return a nonzero value.

Isabelleisac answered 23/7, 2018 at 13:51 Comment(5)

Or you could more easily do this with a debugger by setting a breakpoint in fn(). Using volatile produces code-gen that's like gcc -O0 for that variable: spill/reload between every C statement. (-O0 can still combine multiple accesses within one statement without breaking debugger consistency, but volatile isn't allowed to do that.) – Kevakevan 23/7, 2018 at 17:53

Or more easily, using a debugger :) But, which standard says that variable need to be observable? I mean, an implementation can choose that it must be observable. Another one can say, it is not observable. Does the latter one violate the standard? Maybe not. It is not specified by the standard, how can a local volatile variable be observable at all. – Laughry 23/7, 2018 at 17:53

Even, what does it mean "observable"? Should it be placed on stack? What if a register holds x? What if on x86-64, xor rax, rax holds the zero (I mean, the return-value-register holds x), which of course can be observed/modified by a debugger easily (i.e., debug symbol information holds that x is stored in rax). Does this violate the standard? – Laughry 23/7, 2018 at 17:56

−1 Any call to fn() can be inlined. With MSVC 2017 and default release mode, it is. There is then no “within the fn() function”. Regardless, since the variable is automatic storage there is no “predictable offset”. – Mogilev 23/7, 2018 at 18:35

@Cheersandhth.-Alf If it's inlined, then all instances of it are listed in the debug info. Try setting a breakpoint on return x, it'll work even if inlined. (Unless there are too many instances, when the debugger would complain that there are not enough hardware breakpoints available, but then it's still just a limitation of the debugger) – Isabelleisac 24/7, 2018 at 6:30

-1

I think I have never seen a local variable using volatile that wasn't a pointer to a volatile. As in:

int fn() {
    volatile int *x = (volatile int *)0xDEADBEEF;
    *x = 23;   // request data, 23 = temperature 
    return *x; // return temperature
}

The only other cases of volatile I know use a global that is written in a signal handler. No pointers involved there. Or access to symbols defined in a linker script to be at specific addresses relevant to the hardware.

It's much easier to reason there why the optimization would alter the observable effects. But the same rule applies for your local volatile variable. The compiler has to behave as if the access to x is observable and can't optimize it away.

Micromho answered 23/7, 2018 at 12:49 Comment(10)

But that's not a local volatile variable, it's a local non-volatile pointer to a volatile int at a well-known address. – Euchology 23/7, 2018 at 13:26

Which makes it easier to reason about the correct behavior. As said the rules for accessing a volatile are the same for local variables and pointers to volatile variables being dereferenced. – Micromho 23/7, 2018 at 13:31

I'm just addressing the first sentence of your answer, which seems to suggest that x in your code is a "local volatile variable". It isn't. – Euchology 23/7, 2018 at 13:36

I got mad when int fn(const volatile int argument) didn't compile. – Adulterer 23/7, 2018 at 17:9

The edit makes your answer not wrong, but it simply doesn't answer the question. This is the textbook use-case for volatile, and has nothing to do with it being a local. It could just as well be static volatile int *const x = ... at global scope and everything you say would still be exactly the same. This is like extra background knowledge that's necessary to understand the question, which I guess maybe not everyone has, but it's not a real answer. – Kevakevan 23/7, 2018 at 17:51

A volatile qualifier is required on automatic objects whose lifetime starts before a setjmp, whose value changes between that setjmp and the corresponding longjmp, and whose value is observed after the longjmp. – Nilgai 26/7, 2018 at 20:7

@Nilgai Isn't setjmp/longjmp a function call with side effects and thus any change to the variable needs to be observable anyway? – Micromho 31/7, 2018 at 14:19

@GoswinvonBrederlow: Normally, there's no way that an automatic object whose address has not been exposed to the outside world can be modified between the time a function is called and the time it returns. What makes setjmp weird is that it can return twice, and automatic objects don't have to be exposed to the outside world to change before the second time setjmp returns. – Nilgai 31/7, 2018 at 14:44

@Nilgai Ahh, I get what you refer to now. – Micromho 31/7, 2018 at 14:45

@GoswinvonBrederlow: Many platforms have some registers that called functions are required to leave undisturbed (either leave them alone, or else save them before disturbing them and restore them afterward). Automatic objects can thus safely be cached across function calls. A setjmp would need to save such registers in the jmp_buff and longjmp restore them, but that will be fine if the values don't change between the two returns from setjmp. Objects that might change between the two returns must not be cached across any function that might call longjmp. – Nilgai 31/7, 2018 at 14:53

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags