Compiler optimizations may cause integer overflow. Is that okay?

Asked 17/10, 2022 at 20:5 Answered 18/10, 2022 at 22:15

Solved c++compiler-optimization undefined-behavior integer-overflow integer-arithmetic

I have an int x. For simplicity, say ints occupy the range -2^31 to 2^31-1. I want to compute 2*x-1. I allow x to be any value 0 <= x <= 2^30. If I compute 2*(2^30), I get 2^31, which is an integer overflow.

One solution is to compute 2*(x-1)+1. There's one more subtraction than I want, but this shouldn't overflow. However, the compiler will optimize this to 2*x-1. Is this a problem for the source code? Is this a problem for the executable?

Here is the godbolt output for 2*x-1:

func(int):                               # @func(int)
        lea     eax, [rdi + rdi]
        dec     eax
        ret

Here is the godbolt output for 2*(x-1)+1:

func(int):                               # @func(int)
        lea     eax, [rdi + rdi]
        dec     eax
        ret

Ryals answered 17/10, 2022 at 20:5 Comment(8)

Unsigned integer overflow has well defined behaviour. It is only signed integer overflow that is UB. – Tonry 17/10, 2022 at 20:13

@JesperJuhl Thanks, that satisfies my problem. I think the OP is still interesting in the case of ints, so I have edited the question. – Ryals 17/10, 2022 at 20:15

It's not really wrong for the compiler to just let the multiplication overflow and then let the subtraction underflow back around, as long as such overflows are well-defined on the CPU architecture you're targeting. – Kostroma 17/10, 2022 at 20:46

Related: Does undefined behavior apply to asm code? (no) / Is integer overflow undefined in inline x86 assembly?. – Elonore 18/10, 2022 at 10:17

You are talking about "Compiler optimization" but you need to be very specific by the compiler and by the optimization. [Which compiler and what optimization] You can't assume an optimization will happen, this is bad practice. A better practice would be to work with the types you can use so you won't overflow on math equations. - An exercise you can try is to just try out your function with different values and see what each compiler outputs. – Haddock 20/10, 2022 at 7:26

@jamesdlin I'm curious why you removed the [integer-arithmetic] tag from this question. It seemed applicable. Although I'm not super clear what it's supposed to be about, it seems like it could get tagged very broadly, and this question is fine without it. So really just asking for insight on what that tag is about, and why it was worth an edit to remove it here. – Elonore 10/11, 2022 at 15:36

@PeterCordes That is very strange. I did not intend to remove the [integer-arithmetic] tag; I removed the [dart] tag, but the edit history shows no record of this question ever having a [dart] tag. Seems like a StackOverflow bug... – Westernism 10/11, 2022 at 15:49

@PeterCordes (And that bug is meta.stackoverflow.com/questions/421265/.) – Westernism 11/11, 2022 at 19:28

The ISO C++ rules apply to your source code (always, regardless of the target machine). Not to the asm the compiler chooses to make, especially for targets where signed integer wrapping just works.

The "as if" rules requires that the asm implementation of the function produce the same result as the C++ abstract machine, for every input value where the abstract machine doesn't encounter signed integer overflow (or other undefined behaviour). It doesn't matter how the asm produces those results, that's the entire point of the as-if rule. In some cases, like yours, the most efficient implementation would wrap and unwrap for some values that the abstract machine wouldn't. (Or in general, not wrap where the abstract machine does for unsigned or gcc -fwrapv.)

One effect of signed integer overflow being UB in the C++ abstract machine is that it lets the compiler optimize an int loop counter to pointer width, not redoing sign-extension every time through the loop or things like that. Also, compilers can infer value-range restrictions. But that's totally separate from how they implement the logic into asm for some target machine. UB doesn't mean "required to fail", in fact just the opposite, unless you compile with -fsanitize=undefined. It's extra freedom for the optimizer to make asm that doesn't match the source if you interpreted the source with more guarantees than ISO C++ actually gives (plus any guarantees the implementation makes beyond that, like if you use gcc -fwrapv.)

For an expression like x/2, every possible int x has well-defined behaviour. For 2*x, the compiler can assume that x >= INT_MIN/2 and x <= INT_MAX/2, because larger magnitudes would involve UB.

2*(x-1)+1 implies a legal value-range for x from (INT_MIN+1)/2 to (INT_MAX+1)/2. e.g. on a 32-bit 2's complement target, -1073741823 (0xc0000001) to 1073741824 (0x40000000). On the positive side, 2*0x3fffffff doesn't overflow, doesn't wrap on increment because 2*x was even.

2*x - 1 implies a legal value-range for x from INT_MIN/2 + 1 to INT_MAX/2. e.g. on a 32-bit 2's complement target, -1073741823 (0xc0000001) to 1073741823 (0x3fffffff). So the largest value the expression can produce is 2^n - 3, because INT_MAX will be odd.

In this case, the more complicated expression's legal value-range is a superset of the simpler expression, but in general that's not always the case.

They produce the same result for every x that's a well-defined input for both of them. And x86 asm (where wrapping is well-defined) that works like one or the other can implement either, producing correct results for all non-UB cases. So the compiler would be doing a bad job if it didn't make the same efficient asm for both.

In general, 2's complement and unsigned binary integer math is commutative and associative (for operations where that's mathematically true, like + and *), and compilers can and should take full advantage. e.g. rearranging a+b+c+d to (a+b)+(c+d) to shorten dependency chains. (See an answer on Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)? for an example of GCC doing it with integer, but not FP.)

Unfortunately, GCC has sometimes been reluctant to do signed-int optimizations like that because its internals were treating signed integer math as non-associative, perhaps because of a misguided application of C++ UB rules to optimizing asm for the target machine. That's a GCC missed optimization; Clang didn't have that problem.

Further reading:

Is there some meaningful statistical data to justify keeping signed integer arithmetic overflow undefined? re: some useful loop optimizations it allows.
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
Does undefined behavior apply to asm code? (no)
Is integer overflow undefined in inline x86 assembly?

The whole situation is basically a mess, and the designers of C didn't anticipate the current sophistication of optimizing compilers. Languages like Rust are better suited to it: if you want wrapping, you can (and must) tell the compiler about it on a per-operation basis, for both signed and unsigned types. Like x.wrapping_add(1).

Re: why does clang split up the `2*x` and the `-1` with `lea`/`dec`

Clang is optimizing for latency on Intel CPUs before Ice lake, saving one cycle of latency at the cost of an extra uop of throughput cost. (Compilers often favour latency since modern CPUs are often wide enough to chew through the throughput costs, although it does eat up space in the out-of-order exec window for hiding cache miss latency.)

lea eax, [rdi + rdi - 1] has 3 cycle latency on Skylake, vs. 1 for the LEA it used. (See Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? for some details). On AMD Zen family, it's break-even for latency (a complex LEA only has 2c latency) while still costing an extra uop. On Ice Lake and later Intel, even a 3-component LEA is still only 1 cycle so it's pure downside there. See https://uops.info/, the entry for LEA_B_I_D8 (R32) (Base, Index, 8-bit displacement, with scale-factor = 1.)

This tuning decision is unrelated to integer overflow.

Elonore answered 18/10, 2022 at 13:42 Comment(2)

"That's a GCC missed optimization; Clang didn't have that problem." I don't know about the relative cost of instructions, but I assumed that a three-argument lea instruction is faster than a 2 argument lea + a decrement. Unfortunately, I've never been able to get those kinds of micro-benchmarks right. – Ryals 18/10, 2022 at 17:34

@mbang: I wasn't talking about this case. Clang's optimizing for latency at the cost of an extra uop. lea eax, [rdi + rdi - 1] has 3 cycle latency on Skylake, vs. 1 for the LEA it used. (See Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly?). So it saves 1 cycle of latency at the cost of one extra uop. Somewhat questionable benefit, and it's not better on Zen or Ice Lake, in fact worse (the 3-component LEA has 1-cycle latency on ICL, 2 on Zen). uops.info, LEA_B_I_D8 (R32) entry. – Elonore 18/10, 2022 at 17:50

As Miles hinted: The C++ code text is bound by the rules of the C++ language (integer overflow = bad), but the compiler is only bound by the rules of the cpu (overflow=ok). It is allowed to make optimizations that the code isn't allowed to.

But don't take this as an excuse to get lazy. If you write undefined behavior, the compiler will take that as a hint and do other optimizations that result in your program doing the wrong thing.

Kealey answered 17/10, 2022 at 20:51 Comment(15)

Your answer means that 2*x-1 violates the standard, but 2*(x-1)+1 does not, correct? – Ryals 17/10, 2022 at 20:54

@Ryals consider simpler example of x vs 2*x / 2. y = std::numeric_limits<int>::max() is ok but y = (2* std::numeric_limits<int>::max()) / 2; isnt and a compiler is free to replace it with 42 or bollocks. – Volva 17/10, 2022 at 21:7

@463035818_is_not_a_number x isn't constexpr. I guarantee 0 <= x <= 2^30, but the compiler does not know that. – Ryals 17/10, 2022 at 21:13

@Ryals doesnt really matter. Not saying that this happens, but in principle the compiler might branch on x and conditionally return 42;. Its not about what does actually happen from optimizations but about what you get guaranteed and 2*x/2 is not guaranteed to be defined for any x in the range of int – Volva 17/10, 2022 at 21:15

@463035818_is_not_a_number So you agree with the statement: "As long as 0 <= x <= 2^30, 2*x-1 violates the standard, but 2*(x-1)+1 does not?" – Ryals 17/10, 2022 at 21:18

@Ryals no that statement is using slightly off terms. Neither 2*x-1 nor 2*(x-1)+1 do "violate the standard". They just have different ranges for x for which the expression is defined. Optimizations will a) not result in expressions with a smaller "valid range" for x b) are not guaranteed to result in an expression with larger "valid range" for x. This answer explains that a) holds even when on first sight it looks like it doesnt. b) means that you should not write 2*x-1 and expect it to be equivalent to 2*(x-1)+1 when x can be 2^30 – Volva 17/10, 2022 at 21:24

@463035818_is_not_a_number I'm implying that these bound guarantees are provided by the code so that 2*(x-1)+1 could never run with x=2^30+1. I say that code that conditionally causes UB "violates the standard." Do you agree with my use of "violates the standard" now? If not, do only syntax errors "violate the standard"? (this is a genuine question, not an argument) – Ryals 17/10, 2022 at 21:47

@mbang: No, that's an insane definition of "violates the standard". int foo(int x){ return x+1; } doesn't "violate the standard" on its own, only calling it with INT_MAX as an arg would be UB. You'd only say a program "violates the standard" if that actually happens during its execution. Even int x=INT_MAX; x++; isn't UB if that function is never called, or if block never taken. (The compiler can assume that because it'd be UB). Most expressions involving a signed integer have UB with some input, except ones like x/2 that avoid signed-overflow UB for every possible value of int x – Elonore 18/10, 2022 at 2:40

A point which might help clarify "does a program have undefined behavior": The C++ abstract virtual machine really includes not just the program source but is also parameterized by a number of things including the program's inputs. Some code has undefined behavior based just on the source, no matter what the inputs could be. Some expressions cause UB if the expression is evaluated or only with certain values, meaning that some execution instances of the virtual machine do have UB and others might not. – Kenelm 18/10, 2022 at 14:12

@mbang: From the C++ Standard: "Although this document states only requirements on C++ implementations, those requirements are often easier to understand if they are phrased as requirements on programs, parts of programs, or execution of programs." It is not possible for C++ programs to violate the Standard, because the Standard only specifies requirements for C++ implementations. – Ainsworth 18/10, 2022 at 14:59

@aschepler: There are many programs that need to perform tasks which some implementations can meaningfully accomplish, and others cannot. For an implementation to be suitable for tasks not anticipated by the Standard, it will need to meaningfully process programs beyond those mandated by the Standard. Support for such programs is outside the Standard's jurisdiction. – Ainsworth 18/10, 2022 at 15:2

@Ainsworth I agree, but I'm not sure why you're telling me. – Kenelm 18/10, 2022 at 15:50

"If you write undefined behavior, the compiler will take that as a hint and do other optimizations that result in your program doing the wrong thing." It is, for instance, allowed to assume you will never actually reach undefined behavior if it's possible to branch away from it. Say you have a container with compile-time known length, and you for-loop over it with some conditional return. If the loop index goes one too far, the compiler is allowed to simply assume that you will necessarily hit that conditional return, and Clang will actually use this to prune away everything after the loop. – Fluoroscope 20/10, 2022 at 13:27

@Ryals Yes, I mean if you go past the bounds of the container. It's difficult to write code in a comment, but a function like bool contains(int k){ int a[3]{0,1,2}; for(int i=0; i<4; i++) {if a[i]==k {return true;}} return false;} may be (and is, by clang) optimised to just bool find(int k){return true;}. Because it is standard-compliant to assume you never get to i==3, and therefore you must have returned earlier. – Fluoroscope 23/10, 2022 at 8:50

@Arthur: The fact that the Standard allows implementations intended for tasks not involving some operation X, to assume programs won't do X, does not imply any judgment as to whether implementations making such assumptions should be viewed as particularly suitable for tasks involving X, nor that programs should jump through hoops to be compatible with implementations that aren't particularly suitable for use with them. – Ainsworth 14/11, 2022 at 21:31

Just because signed integer overflow isn't well-defined at the C++ language level doesn't mean that's the case at the assembly level. It's up to the compiler to emit assembly code that is well-defined on the CPU architecture you're targeting.

I'm pretty sure every CPU made in this century has used two's complement signed integers, and overflow is perfectly well defined for those. That means there is no problem simply calculating 2*x, letting the result overflow, then subtracting 1 and letting the result underflow back around.

Many such C++ language-level rules exist to paper over different CPU architectures. In this case, signed integer overflow was made undefined so that compilers targeting CPUs that use e.g. one's complement or sign/magnitude representations of signed integers aren't forced to add extra instructions to conform to the overflow behavior of two's complement.

Don't assume, however, that you can use a construct that is well-defined on your target CPU but undefined in C++ and get the answer you expect. C++ compilers assume undefined behavior cannot happen when performing optimization, and so they can and will emit different code from what you were expecting if your code isn't well-defined C++.

Kostroma answered 17/10, 2022 at 21:13 Comment(6)

Signed integer overflow still yields undefined behavior in C++20, despite the mandate to use two's complement. – Eyot 18/10, 2022 at 1:52

I wonder if there are any target architectures available on godbolt which do use one's complement, so we can compare the results. – Finback 18/10, 2022 at 5:55

@kaya3: Pretty sure no. Certainly none of the ones using GCC, as it only supports 2's complement targets. gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html – Elonore 18/10, 2022 at 10:0

"I'm pretty sure every CPU made in this century has used two's complement signed integers" Why is it that every time someone says "I'm pretty sure that..." I feel this urge to go down the rabbit hole of research and prove them wrong? Anyway, there seems to be a counterexample, mentioned here and here. – Commander 18/10, 2022 at 21:4

@Commander Those links contain some very interesting information. Although I guess you can nitpick the definition of "made" since it appears the latest Dorado-based mainframes are based on hardware emulation on unnamed Intel chips. The marketing material using the interesting phrase "emulated IOPs" to describe the peformance. – Foofaraw 20/10, 2022 at 19:33

@Heinzi: How about "every CPU for which a "good-faith-conforming" C99 compiler has ever been produced"? One would have to go to absurd lengths to design an architecture with word size is smaller than 65 bits, but which can efficiently support the unsigned long long type mandated by C99, without it also being able to effiiciently process two's-complement signed arithmetic. – Ainsworth 14/11, 2022 at 21:26

The ISO C++ rules apply to your source code (always, regardless of the target machine). Not to the asm the compiler chooses to make, especially for targets where signed integer wrapping just works.

In this case, the more complicated expression's legal value-range is a superset of the simpler expression, but in general that's not always the case.

Further reading:

Is there some meaningful statistical data to justify keeping signed integer arithmetic overflow undefined? re: some useful loop optimizations it allows.
http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
Does undefined behavior apply to asm code? (no)
Is integer overflow undefined in inline x86 assembly?

Re: why does clang split up the `2*x` and the `-1` with `lea`/`dec`

This tuning decision is unrelated to integer overflow.

Elonore answered 18/10, 2022 at 13:42 Comment(2)

Signed integer overflow/underflow is undefined behavior precisely so that compilers may make optimizations such as this. Because the compiler is allowed to do anything in the case of overflow/underflow, it can do this, or whatever else is more optimal for the use cases it is required to care about.

If the behavior on signed overflow had been specified as “What the DEC PDP-8 did back in 1973,” compilers for other targets would need to insert instructions to check for overflow and, if it occurs, produce that result instead of whatever the CPU does natively.

Ajani answered 18/10, 2022 at 22:15 Comment(18)

This optimization would be legal with unsigned integers, or with gcc -fwrapv, where signed wrap-around in the abstract machine is well-defined. (In GCC's case, as 2's complement wrapping. gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html). But on any machine that did any kind of wrapping (not saturating or trapping), 2*(x-1)+1 and 2*x-1 should always produce the same result. (Thus the mathematically correct result if that fits in an int). – Elonore 19/10, 2022 at 7:24

It might not be the same result as a PDP-8 or PDP-11 for some inputs, but those two expressions should always be equivalent to each other, so if the rule was that signed wrapping is implementation-defined instead of UB, the optimization would still be legal. The standard allows 2's complement, 1's complement, and sign/magnitude, so mandating the exact semantics of PDP-8 or PDP-11 wouldn't make sense as an alternative to saying it's fully UB. – Elonore 19/10, 2022 at 7:27

@PeterCordes My understanding was that there were some CPUs out there that weren’t two’s-complement and might even trap on overflow, hence making the behavior UB so compilers could still use the native instructions. – Ajani 19/10, 2022 at 13:17

Yes, making signed-overflow be UB allows easy compilation for machines where the native instructions trap instead of wrap. But on such machines, optimizations like this would be forbidden, because they could introduce a trap where the C++ abstract machine didn't have one. So you'd need to sub/add/sub instead of add/sub. Which is basically the opposite of what you said, that it being UB allows this optimization (there or on normal modern machines?) – Elonore 19/10, 2022 at 13:22

@PeterCordes Because signed overflow is UB, the compiler is entitled to trap, but it’s also entitled to correct the calculation so it doesn’t trap. The rewritten version that avoids UB works on either architecture, and ISAs that trap are the canonical example of an implementation that makes that necessary. In any case, making it UB enabled maximum optimization on all architectures. – Ajani 19/10, 2022 at 14:28

The question is whether it's legal for a compiler to optimize 2*(x-1)+1 into asm that computes it as 2*x-1. On a machine with trapping signed overflow, such as a compiler targeting MIPS using add would introduce a trap for x=0x40000000 where the C++ abstract machine would avoid one. (Real compilers for MIPS use addu so they can do such optimizations, and because of historical sloppy codebases that sometimes have int overflows). There's no reason a compiler would ever turn 2*x-1 into asm that computes it like 2*(x-1)+1, we have to do that manually to avoid UB. – Elonore 19/10, 2022 at 15:1

@PeterCordes You seem to have gotten the idea, I don’t know how, that I was claiming that it’s always legal to make that optimization on all architectures? – Ajani 19/10, 2022 at 23:34

1. Why does this UB have any relevance to the optimization being legal when targeting x86? Any implementation-defined wrapping at the target machine's INT_MAX/INT_MIN would also allow it. Your example of having to reproduce the exact wrapping behaviour of PDP-8 on a machine with a different width and representation of int feels like a straw-man. Of course a very specific required behaviour for int overflow would be constraining, but that wouldn't make sense given that C already allows 1's/2's complement or sign/mag. – Elonore 19/10, 2022 at 23:58

2. If I'm missing something about some other types of machines where signed overflow UB might allow this optimization but a convenient choice of impl-defined wrapping wouldn't, a specific example would probably help, of a value that wouldn't work with the machine's wrapping. Like some sign/magnitude thing if wrapping just wraps the magnitude? Still not sure that would be a problem, but I haven't worked it through. – Elonore 20/10, 2022 at 0:0

I didn't really think you were claiming it's always legal to make that optimization on all architectures, but I'm kind of grasping at straws trying to understand what you are arguing. Since it doesn't seem to help machines that have only trapping signed-addition instructions, and isn't necessary on machines that have well-defined wrapping/unwrapping and thus can implement the C semantics more simply. Assuming that the semantics would always produce the same value for every possible input, including those which wrap (by a lot, even), which I think is the case for most/all sane impl-def wrap – Elonore 20/10, 2022 at 0:4

If the authors of the Standard intended to invite even implementations targeting quiet-wraparound two's-complement hardware to treat integer overflow nonsensically, why did they say on page 44 of the published C99 Rationale document, when discussing whether short unsigned types should promote to int or unsigned int, that they expected that "most current implementations" would process a construct like uint1 = ushort1*ushort2; in a manner equivalent to using unsigned math, even in case of overflow? – Ainsworth 14/11, 2022 at 21:20

@Ainsworth I think the point you are making is that, on a 16-bit implementation, unsigned short int promotes to a signed int, so that expression technically could have a signed overflow? As I said, the Standard makes this undefined behavior precisely so that an implementation is free to do the right thing, whatever that is on the architecture. There have been numerous cases of revisions saying that oddball implementation choices are no longer allowed. (NULL != (void*)0, sizeof(int) > sizeof(long), negative remainders, etc.) – Ajani 14/11, 2022 at 23:51

@Davislor: When targeting 32-bit platforms, the gcc compiler will sometimes, by design, generate code for uint16-by-uint16 multiplies which will behave in gratuitously meaningless fashion causing arbitrary memory corruption in some cases where the product would exceed INT_MAX. – Ainsworth 14/11, 2022 at 23:56

@Ainsworth If you think about it, if signed overflow on multiplication were not UB, implementations would have to do ushort1 * ushort2 by the book, widening them to signed int operands and then applying the signed int overflow rules. As you say, though, the authors of the Standard wanted to give implementations to perform unsigned multiplication instead (which is allowed, because if signed overflow occurs, it’s entitled to return anything, including the the result of interpreting the quantities as unsigned integers and multiplying them.) – Ajani 14/11, 2022 at 23:58

@Ainsworth Oh, okay. So, I’m not quite sure whether you disagree with something i said or not? – Ajani 14/11, 2022 at 23:58

@Ainsworth GCC is quite notorious for performing unsafe optimizations that break programs, and insisting that this technically does not violate the standard because there’s some undefined behavior in the program, somewhere. – Ajani 15/11, 2022 at 0:1

@Davislor: You thought I was talking about 16-bit implementations. I was not. I was talking about 32-bit ones. Persoanlly, I think the phrase "Gratuitously Clever C" should be used to describe dialects such as the ones gcc and clang want to process, but I don't know what a good retronym would be to describe the language the Committee was chartered to describe. – Ainsworth 15/11, 2022 at 15:59

@Davislor: I think a fundamental problem is that the authors of gcc refuse to accept that when the Standard says there is no difference in emphasis between the three ways of characterizing something as undefined behavior, it is meant to say that the Standard imposes no judgment on whether implementations intended for various platforms and purposes should be expected to define the behavior, and in no way implies that implementations that would been expected to define the behavior prior to the Standard shouldn't be expected to continue doing so. – Ainsworth 15/11, 2022 at 16:22

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Re: why does clang split up the 2*x and the -1 with lea/dec

Re: why does clang split up the 2*x and the -1 with lea/dec

Recommended topics

Hot tags

Re: why does clang split up the `2*x` and the `-1` with `lea`/`dec`

Re: why does clang split up the `2*x` and the `-1` with `lea`/`dec`