TL:DR: You're right, such optimization is not forbidden for signed int
, only for float
/double
, and not just because of exceptions in that case.
One reason for things to be UB is that some obscure machine could raise an exception. But hitting UB is not guaranteed to raise an exception on all machines (unless you're compiling with gcc -fsanitize=undefined
, for types of UB it or clang can reliably detect, or gcc -ftrapv
to define the behaviour of signed int overflow as trapping). When a compiler treats UB as an optimization opportunity by assuming something won't happen, things are very different: UB is not a synonym for "fault" or "trap".
There are operations that might trap on normal CPUs, such as deref of unknown pointers, and integer division on some ISAs (such as x86 but not ARM). These would work as examples if you're looking for an operation that compilers potentially need to be careful of to avoid introducing an exception before side-effects that need to happen, or before a branch that might cause the abstract machine not to reach an undefined operation at all.
Signed integer overflow is UB so anything can happen at any point in the execution of a program where that happens (in C++, and according to some interpretations of the C standard), even when compiling for a machine with non-trapping add
instructions (like all modern ISAs).
Some implementations might define the behaviour as raising an exception. If they define where that exception gets raised, then it would prevent the optimization; each addition needs to happen as written so it can trap there if that operation in the abstract machine overflows. But that would be defining the behaviour, the exact opposite of UB meaning absolutely zero guarantees about what your program actually does.
In C, if n3128 is accepted1, any visible side-effects sequenced before the abstract machine encounters UB must happen. But after UB is encountered, literally anything is allowed, including doing I/O. UB doesn't have to fault and stop execution. If a compiler was compiling the +=
operations with MIPS signed-overflow-trapping add
instructions instead of the usual addu
, it would be legal to optimize to x+=12
after the intervening code even if it contained I/O operations or other visible side-effects (like a volatile
read or write). Even if the x+=5
caused signed overflow UB in the abstract machine, it's fine if the actual behaviour is to trap later (for example when the abstract machine would have done the x+=7
part). As long as it's at or after the abstract machine hit UB, literally anything is allowed. (In C++, it would also be legal to do the possibly-trapping addi $s0, $s0, 12
even before a printf
or something, due to the explicit lack of requirements on behaviour even before the first undefined operation, for an execution that does encounter UB. But only if the compiler can prove that printf definitely returns, so in practice this optimization can usually only happen for volatile
accesses if at all. But even without retroactive effects, we can either do x+=5
before and x+=7
after, or x+=12
after. Not faulting is valid behaviour for signed overflow, but the abstract machine has done an undefined operation so anything that happens later, like printing and then trapping, or just having the addition wrap, is allowed.)
The compiler just has to avoid introducing exceptions on paths of execution that shouldn't have any. (Which isn't a problem for integer addition on mainstream ISAs; most don't even have a trapping signed-add instruction, and compilers targeting MIPS use addu
even for signed math so they can optimize freely, and because historically programmers didn't want trapping on int
math.)
Footnote 1: C vs. C++, and whether C UB should be "concrete" or "abstract"
See Does undefined behaviour retroactively mean that earlier visible side-effects aren't guaranteed? and n3128: Taming the Demons -- Undefined Behavior and Partial Program Correctness, a proposal to have ISO C clearly specify that visible side-effects (like I/O) before the abstract machine reaches an undefined operation must still happen. (Common interpretations of the current ISO C standard treat UB like in C++, where the C++ standard explicitly allows "breaking" stuff along an inevitable path to UB.)
Practical example of compilers doing this
Both int
and unsigned
can do this optimization, it's only FP types that can't, but that's (also) because of rounding even if you compile with gcc -fno-trapping-math
(an FP math option). See it in action on Godbolt with GCC13 and Clang 16
int sink; // volatile int sink doesn't make a difference
int foo_signed(int x) {
x += 5;
sink = 1;
x += 7;
return x;
}
// also unsigned and float versions
# GCC -O3 -fno-trapping-math
foo_signed: # input in EDI, retval in EAX
mov DWORD PTR sink[rip], 1
lea eax, [rdi+12] # x86 can use LEA as a copy-and-add
ret
foo_unsigned:
mov DWORD PTR sink[rip], 1
lea eax, [rdi+12]
ret
foo_float: # first arg and retval in XMM0
addss xmm0, DWORD PTR .LC0[rip] # add Scalar Single-precision
mov DWORD PTR sink[rip], 1
addss xmm0, DWORD PTR .LC1[rip] # two separate 5.0f and 7.0f adds
ret
Earlier version of an answer, making some different points for the same conclusion
You're correct; assuming x
is a local variable so literally nothing can possibly use the x += 5
result, it's safe to optimize x+=5; ... ; x+=7
to x+=12
for both signed
and unsigned
integer types.
Unsigned integer math is of course fine.
Signed integer math has to produce the right result in any case where the abstract machine doesn't encounter UB. x+=12
does that. There's no guarantee that signed overflow raises an exception at any specific point in your program, that's the whole point of optimization in modern C based on the assumption that undefined behaviour won't happen. For an execution that will encounter UB, literally anything can happen anywhere before or after that point (but see footnote 1 above re: "breaking" stuff along an inevitable path to UB).
This optimization would be safe even for turning x-=5; x+=7
into x+=2
, where the abstract machine could wrap twice (encountering UB) but the asm doesn't wrap, since "happens to work" is an allowed behaviour, and common in practice. (Even using MIPS trapping add
instructions, for example.)
If you use compiler options like gcc -fwrapv
, that defines the behaviour of signed integer math to be 2's complement wrapping, removing UB and making the situation identical to unsigned.
GCC does sometimes miss optimizations with signed integer math because of some reluctance for GCC internals to create signed overflow in a temporary in the asm where none would have existed in the abstract machine. This is a missed optimization when compiling for a machine that allows non-trapping integer math (i.e. all modern ISAs.) For example, GCC will optimize a+b+c+d+e+f
into (a+b)+(c+d)+(e+f)
for unsigned int
but not for signed int
without -fwrapv
. Clang does for both for AArch64 and RISC-V, although chooses not to for x86. (Godbolt). Again, this is a missed optimization due to GCC being over-cautious for some unknown reason; it's perfectly valid. 2's complement signed math is identical to unsigned binary math, so is associative; the final result will be correct in cases where the optimized computation wrapped back and forth but the abstract machine didn't, for example.
Signed overflow UB is only a thing in the abstract machine, not asm; most mainstream ISAs don't even have integer addition instructions that trap on overflow. (MIPS does, but compilers don't use them for int
math, so they can do optimizations that produce values that didn't exist in the abstract machine.)
Semi related: Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)? (answers show that compilers do optimize to three multiplies for integer math, even for signed int
.)
FP exceptions aren't the only issue for float/double
Floating-point math can't do this optimization because it could give a different result in non-overflowing cases, due to different rounding. Two smaller numbers could both round down, vs. one larger number overcoming the threshold.
e.g. for a number large enough that the nearest representable double
values are 16
apart from each other, 8
would get half-way and round to nearest-even (assuming the default rounding mode). But any less, like 7
or 5
, will always round back down; x + 7 == x
, so both the 5
and the 7
would be lost, but x+12
all in one go would get over the hump to the next representable float or double, producing x+16
.
(The magnitude of 1 unit-in-the-last-place (of the mantissa) depends on the exponent of a float/double. For large enough FP values, it's 1.0. For even larger values, e.g. double
from 253 to 254 only even numbers are representable, and so on with larger exponents.)
If you compile with GCC's buggy default of -ftrapping-math
, it will try to respect FP exception semantics. It doesn't reliably generate 2 FP exceptions if overflow happens twice, so it might not care about that.
But yes, with #pragma STDC FENV_ACCESS ON
, every separate FP operation is supposed to have an observable effect. (https://en.cppreference.com/w/c/numeric/fenv). But if you don't call fegetexcept
to actually observe FP exception flags between two operations, they could in theory still be optimized if we can prove that rounding would be the same, since I don't think even ISO C's FENV_ACCESS ON
is supposed to support actually running exception / signal handlers for each trapping operation.
For example two identity operations like x *= 1.0;
could be collapsed to one, which will raise exceptions on NaN. Or x *= 2; x *= 2;
could be optimized to x *= 4;
because multiplying by exact powers of 2 doesn't change the mantissa and thus doesn't cause rounding. It doesn't matter if the first or second multiply overflowed to +-Inf
, that will still be the final result. (Unless Inf * 2
raises exception flags that an overflowing multiply wouldn't have already raised? I don't think so.)
And they're both changing the exponent in the same direction, unlike x *= 4; x *= 0.5;
which could overflow to +Inf for large numbers, so isn't equivalent to x *= 2
. Also, if x *= 0.5; x *= 0.5;
produces subnormal results, it actually could round twice when right-shifting the mantissa; IEEE FP has gradual underflow (subnormals with a special encoding for the exponent) but non-gradual overflow to +Inf.
Figuring out whether it's safe to optimize x * 0.5 * 0.5
to x *= 0.25
is beyond the scope of this answer. GCC and clang don't optimize x *= 2.0f; x *= 2.0f;
into x *= 4.0f;
even with -fno-trapping-math
, but I think that's a missed optimization.
x
in practical terms. However you cannot really talk about "practical terms" if you're talking about "UB not happening". – Blenx += 5
would cause undefined behaviour it doesn't matter what the compiler does next? – Jobiex += 5;
to raise an exception if it overflows and thus terminate the program before doing the “something else,” even though this is not required by the C standard alone. – Kerfunsigned
and notsigned
. But my contention is that this distinction is irrelevant since the compiler can safely assume our program has no UB. – Scuppernongx += 5;
to "raise an exception" in this case. The point of his example was to demonstrate what a (spec compliant) C compiler can and cannot do with our code. – Scuppernongx - 12 <= MAX_INT
.", did you meanx <= MAX_INT - 12
? – Plumberx += 7; x += -3;
. In this case, it may be thatx += 7
would cause overflow, butx += 4
won't. I'm not sure if it affects the rules, just pointing out that addition ofsigned int
is not associative in terms of causing overflow. – Kalmia