The C standard gives compilers a lot of latitude to perform optimizations. The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.
Note: the following examples are only valid because x
never has its address taken, so it is “register-like”. They would also be valid if the type of x
had trap representations; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char
. If x
had a signed type, then the implementation could define the bit pattern that is not a number between -(2n-1-1) and 2n-1-1 as a trap representation. See Jens Gustedt's answer.
Compilers try to assign registers to variables, because registers are faster than memory. Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times. Consider the program fragment
unsigned x, y, z; /* 0 */
y = 0; /* 1 */
z = 4; /* 2 */
x = - x; /* 3 */
y = y + z; /* 4 */
x = y + 1; /* 5 */
When line 3 is evaluated, x
is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out. Since z
is not used after line 4, and x
is not used before line 5, the same register can be used for both variables. So this little program is compiled to the following operations on registers:
r1 = 0;
r0 = 4;
r0 = - r0;
r1 += r0;
r0 = r1;
The final value of x
is the final value of r0
, and the final value of y
is the final value of r1
. These values are x = -3 and y = -4, and not 5 and 4 as would happen if x
had been properly initialized.
For a more elaborate example, consider the following code fragment:
unsigned i, x;
for (i = 0; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
Suppose that the compiler detects that condition
has no side effect. Since condition
does not modify x
, the compiler knows that the first run through the loop cannot possibly be accessing x
since it is not initialized yet. Therefore the first execution of the loop body is equivalent to x = some_value()
, there's no need to test the condition. The compiler may compile this code as if you'd written
unsigned i, x;
i = 0; /* if some_value() uses i */
x = some_value();
for (i = 1; i < 10; i++) {
x = (condition() ? some_value() : -x);
}
The way this may be modeled inside the compiler is to consider that any value depending on x
has whatever value is convenient as long as x
is uninitialized. Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values. Thus the compiler may analyze the code above in this way:
- during the first loop iteration,
x
is uninitialized by the time -x
is evaluated.
-x
has undefined behavior, so its value is whatever-is-convenient.
- The optimization rule
condition ? value : value
applies, so this code can be simplified to condition; value
.
When confronted with the code in your question, this same compiler analyzes that when x = - x
is evaluated, the value of -x
is whatever-is-convenient. So the assignment can be optimized away.
I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do. I wouldn't be surprised to encounter one. Here's a less plausible example of a compiler with which your program crashes. (It may not be that implausible if you compile your program in some kind of advanced debugging mode.)
This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger. Any assignment to a variable first makes sure that its memory page is mapped normally. This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables. When x = - x
is evaluated, the right-hand side causes a trap and the debugger fires up.
x -= x
. The question is why accessing uninitialized values at all is UB. – Detrudeuint16_t foo(void) {uint16_t result;
, followed by various statements, each of which may or may not write result and thenreturn result;}
, it may be helpful to have the compiler keepresult
in a 32-bit register and then return that. If anything stores a value to result, the compiler will ensure the value stored is 0..65535, but if nothing writes toresult
, keeping the return value within that range would require adding an extra instruction. – Wrenchx - x
. Were the Standard to specify thatuint16_t foo(void) {uint16_t result; result -= result; return result;}
returns 0, this would not make conformant programs and libraries bigger. We don't worry about buggy code producing larger binaries. We do want the compiler to be able to optimize conformant programs by taking advantage of undefined behavior, and the added specification wouldn't change that. – Intestinal