Can different optimization levels lead to functionally different code?
Asked Answered
U

14

46

I am curious about the liberties that a compiler has when optimizing. Let's limit this question to GCC and C/C++ (any version, any flavour of standard):

Is it possible to write code which behaves differently depending on which optimization level it was compiled with?

The example I have in mind is printing different bits of text in various constructors in C++ and getting a difference depending on whether copies are elided (though I've not been able to make such a thing work).

Counting clock cycles is not permitted. If you have an example for a non-GCC compiler, I'd be curious, too, but I can't check it. Bonus points for an example in C. :-)

Edit: The example code should be standard compliant and not contain undefined behaviour from the outset.

Edit 2: Got some great answers already! Let me up the stakes a bit: The code must constitute a well-formed program and be standards-compliant, and it must compile to correct, deterministic programs in every optimization level. (That excludes things like race-conditions in ill-formed multithreaded code.) Also I appreciate that floating point rounding may be affected, but let's discount that.

I just hit 800 reputation, so I think I shall blow 50 reputation as bounty on the first complete example to conform to (the spirit) of those conditions; 25 if it involves abusing strict aliasing. (Subject to someone showing me how to send bounty to someone else.)

Ultimately answered 15/6, 2011 at 21:5 Comment(14)
Realize that, in the case of copy elision, the compiler is specifically allowed to change the observable behavior of the program.Eleonoraeleonore
@Rob: That's fine, I'd be very happy to see just a working example of that. I understand that all constructors are expected to yield semantically identical objects, so by putting print routines in them I'm purposefully introducing a discrimination that the compiler doesn't have to be concerned by. That would be an acceptable example, though!Ultimately
"I've not been able to make such a thing work" - maybe in GCC copy ctor elision is enabled even with no optimization?Bothersome
ideone.com/uvcoV - notice how Func1 doesn't invoke the copy constructor, but Func2 does.Eleonoraeleonore
@Rob: Yes, very nice, but I can't prevent the optimization for Func1 with compiler settings, can I? (Good to know that this sort of code is nicely optimizes, though!)Ultimately
@Kerrek: aha! man to the rescue, yes you can, with -fno-elide-constructors. Assuming that you permit such a fine-grained option as a "different optimization level", I think that fits your requirements.Bothersome
@Steve: Ooh, will try that with Rob's code ... yes, that actually works! Doesn't seem to be part of any numeric optimization level; probably there's no case where this would be desirable.Ultimately
Unfortunately what is allowed by the standard and what is reality often differs. E.g. on x86 based architectures it is not a good idea to use std::set<double> and similar constructs, since especially when optimization is enabled, those can behave errenously (i.e. crash).Alcantara
PlasmaHH: How can a set<double> crash? I mean, I know that you cannot compare doubles by binary identity when you make computations, but when you just use them as a write-once-read-many key, what's the problem?Ultimately
@Kerrek: at a guess, you could have problem if the set tries to compare a value that has somehow survived in an 80-bit FPU register all the way to its use, with "the same value" that has been stored in memory along the way and hence got truncated. This could lead to an impossibility that the std::set implementation didn't anticipate. I'm not sure this is standard-compliant, mind, but PlasmaHH is talking about reality differing from the standard, i.e. a bug, so anything goes really :-)Bothersome
@Steve: that's definitely an interesting scenario! I wonder if that's been thought about...Ultimately
Also note that < is not necessarily a strict weak order on double, because 0 < NaN and NaN < 1 are both false, but 1 < 0 is also false. Or to put it another way, !(X < Y) && !(Y < X) is not an equivalence relation because it's not transitive for NaNs. So set<double> can crash if NaNs get involved even if it's bug-free.Bothersome
@Steve: Yeah, good point - you definitely cannot have NaN as a key. But as I said, we'd definitely never want to do arithmetic on the key values. At the very best, we consider them purely for storage. (So don't go storing NaNs!)Ultimately
@SteveJessop Do not use NaN as key is a reasonable constraint. Do you use numbers that cannot be represented exactly in the fp type is not!Hak
W
19

The portion of the C++ standard that applies is §1.9 "Program execution". It reads, in part:

conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below. ...

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible execution sequences of the corresponding instance of the abstract machine with the same program and the same input. ...

The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions. ...

So, yes, code may behave differently at different optimization levels, but (assuming that all levels produce a conforming compiler), but they cannot behave observably differently.

EDIT: Allow me to correct my conclusion: Yes, code may behave differently at different optimization levels as long as each behavior is observably identical to one of the behaviors of the standard's abstract machine.

Wife answered 15/6, 2011 at 21:22 Comment(15)
How does this square with the invocation of different constructors? If I write T x /* default */; x = T(2, 3);, would it be illegal to optimize this to T x(2, 3);?Ultimately
Note "one of the possible execution sequences". It doesn't have to be the same one at each optimization level. So for example the compiler is allowed to evaluate function arguments in one order at -O0 and a different order at -O4. A program whose output differs according to which order actually occurs is valid, although it isn't what the C standard calls "a strictly conforming program". Then again printf("%d\n", CHAR_BIT) isn't a strictly conforming program either since the output is implementation-defined.Bothersome
@Kerrek: that optimization is illegal for example if the assignment operator T::operator=(const T &) has observable side-effects. As you know there's a special rule to allow copy ctor elision even when the copy ctor has side-effects. There's no such rule for copy assignment. Same applies to the default construction, its observable behavior can't be elided either. But as long as nothing affects observable behavior, the compiler can leave out anything it wants (and if x is unused, construct no object at all).Bothersome
@Steve: I didn't actually know about the exception for the copyctor. Does that mean that the indeterminacy in how often something gets copied (or default-constructed) is the only thing that may affect visible behaviour according to the standard?Ultimately
@Kerrek: you did know, even if you didn't know that it was an explicit special case, since you mentioned copy ctor elision in the question :-). It's not the only thing: anything that is unspecified or implementation-defined can change. The language used is 12.8/15 is "the implementation is allowed to do this", but in effect what it's saying is the same as if it said "it is unspecified whether the implementation does this"Bothersome
Hmm... there've been many good answers, but I have to go with one of them. Let me accept this one, I hope that won't diminish the value of all the other ones! (And if you tell me how to forward reputation I will send you some for further examples!)Ultimately
@Rob: I never gave you that reputation. Have it now, with interest. (In 24h, that is.)Ultimately
@SteveJessop: My first thought on seeing the question was printf("Hey\n")+printf("There\n");, though I'm curious about how the standard would regard the conformance of printf("%d",fgetc(f1)+fgetc(f2)); If f1 and f2 happen to be pipe streams, one could construct scenarios where the sequence in which they are read would be observable, but in most practical circumstances a program which read f1 first would be equivalent to one which read f2 first.Woodbury
@supercat: In C terminology it is conforming but not strictly conforming. A strictly conforming program is one whose observable behavior is necessarily the same in all conforming implementations. Operations with unspecified order and side effects is just one of the many, many things you can do in C that conforms but doesn't strictly conform. printf("%d\n", CHAR_BIT) is another. CHAR_BIT would never vary with optimization level, of course. The examples with + might, but it's not obvious why they would given the symmetry involved.Bothersome
@Steve: If different functions had been called (instead of printf twice or fgetc twice), it's possible that a one sequence would require the return value of the first function to be saved while calling the other, but flipping them could avoid that step. Otherwise, how would "strictly conforming" be defined with things like volatile variables? Could a program that performs n=v1+v2; [v1 and v2 volatile] be strictly conforming, depending upon how v1 and v2 are used elsewhere? Also, does "conforming" imply that a program must work "correctly" in all implementations? What defines "correct"?Woodbury
@supercat: (1) order of access to volatile variables is observable, so sure v1+v2 is not strictly conforming if v1 and v2 are volatile. (2) nothing defines correct, a conforming program is one that's syntactically correct and satisfies semantic constraints (hence has no undefined behavior on any conforming implementation). But since its observable behavior may depend on things that are implementation-defined or unspecified, it doesn't have a single definitive "correct" behavior.Bothersome
@SteveJessop: When you say satisfies semantic constraints, do you mean "does whatever it's supposed to do"? For example, if the stated purpose of a program is to output every distinct line in the source file once each, in arbitrary order, would a program which might on some implementations output a line twice not be considered conforming even if it never hit any UB, while a program whose output sequence on different implementations could legally vary but which would always contain each distinct line exactly once would be conforming?Woodbury
@supercat: no, I mean the same thing that the standard means by "semantic constraints". For example "it is UB to dereference a null pointer" is a semantic constraint.Bothersome
@SteveJessop: Hmm... it would seem like it would be useful to have a term which specifies that "A program which is defined as performing some function will be considered ____ if, when fed to any standards-conforming compiler whose implementation limits are not too small, it will perform the indicated function." In some contexts a guarantee that a program is guaranteed not to invoke UB may be highly desirable, but a program whose only behavioral guarantee is that it won't invoke UB isn't very useful. Replacing any program with int main(void) { return 0;} would satisfy that.Woodbury
I don't think it's any of the standard's business what claims you make about what your code does, so it doesn't define anything like that. I think the term I'd use for that concept, to replace ______, is "correctly documented" :-) Looking from the other direction (the documentation is correct by definition because it's the functional requirements, the question is whether or not the code is right), you'd say "ready to ship". Or "portable".Bothersome
B
16

Floating point calculations are a ripe source for differences. Depending on how the individual operations are ordered, you can get more/less rounding errors.

Less than safe multi-threaded code can also have different results depending on how memory accesses are optimized, but that's essentially a bug in your code anyhow.

And as you mentioned, side effects in copy constructors can vanish when optimization levels change.

Blunder answered 15/6, 2011 at 21:12 Comment(3)
I'd like to exclude multithreading stuff like that because that's essentially just undefined behaviour -- you're only allowed a multithreading example if it synchronizes properly. I hadn't thought about floating point inaccuracies, let's discount those, too (though that's a fair point).Ultimately
Compilers aren't allowed to reorder floating point operations unless you use "unsafe" optimization options like -ffast-math. (They are allowed to "contract" a*b + c into fma(a,b,c) though, and do so in practice when targeting a CPU that has FMA instructions.)Chartography
Other than that, floating-point rounding differences are usually only a thing on implementations like 32-bit x86, using the x87 FP, where temporaries in register have more precision than double. i.e. FLT_EVAL_METHOD == 2, except even across statements (not just within one expression) unless you use gcc -ffloat-store or -O0. See also randomascii.wordpress.com/2012/03/21/…. Mostly that's a thing of the past, though, with SSE2 for scalar math on x86, like most other ISAs, having deterministic FP with FLT_EVAL_METHOD == 0.Chartography
S
14

Is it possible to write code which behaves differently depending on which optimization level it was compiled with?

Only if you trigger a compiler's bug.

EDIT

This example behaves differently on gcc 4.5.2:

void foo(int i) {
  foo(i+1);
}

main() {
  foo(0);
}

Compiled with -O0 creates a program crashing with a segmentation fault.
Compiled with -O2 creates a program entering an endless loop.

Steffaniesteffen answered 15/6, 2011 at 21:6 Comment(9)
Especially in multi-threaded code, very subtle code bugs (that is, code where the result is not well-defined) will show different results with different optimization levels. It may not look like a bug not to mark a variable as volatile, and it may only cause problems at a specific optimization level-- but this is not a compiler bug.Calvities
OK, simple example: If I write T foo[] = { T(1), T(2), T(3) };, most compilers will construct the objects in place, but the standard perfectly allows temporary + copy. So if I put a print routine in the copy constructor, I already get different behaviour depending on how this is implemented, without any bugs at all, non? Now I just want something like that dependent on the optimization level.Ultimately
Your example is not a compiler bug though, it works fine with optimization because gcc does tail call optimization, it crashes without optimization because you run out of stack space.Kaslik
Haha, that's a cool example! Not quite what I had in mind (I'd really like two functioning programs), but that's great!Ultimately
@nos: At no point does the OP state that he is looking for compiler bugs.Blunder
The above is definitely not a compiler bug. It sounds like the code is working perfectly in both cases.Ferrite
Fore more info about this: #5494188Randellrandene
Not only if you trigger a bug. An example: int x = printf("hello") + printf("world"); It could print helloworld or worldhello, and which it does might depend on whether optimization is turned on. I.e. optimization could change things where the behavior is unspecified.Paranymph
I liked the example as it actually shows a particular optimization tail recursion in place. In the first case the optimization is not applied, and it generates a stackoverflow by recursively calling itself, while in the second example the optimizer has converted the recursion into a loop... again great example.Wallen
B
11

OK, my flagrant play for the bounty, by providing a concrete example. I'll put together the bits from other people's answers and my comments.

For the purpose of different behaviour at different optimizations levels, "optimization level A" shall denote gcc -O0 (I'm using version 4.3.4, but it doesn't matter much, I think any even vaguely recent version will show the difference I'm after), and "optimization level B" shall denote gcc -O0 -fno-elide-constructors.

Code is simple:

#include <iostream>

struct Foo {
    ~Foo() { std::cout << "~Foo\n"; }
};

int main() {
    Foo f = Foo();
}

Output at optimization level A:

~Foo

Output at optimization level B:

~Foo
~Foo

The code is totally legal, but the output is implementation-dependent because of copy constructor elision, and in particular it's sensitive to gcc's optimization flag that disables copy ctor elision.

Note that generally speaking, "optimization" refers to compiler transformations that can alter behavior that is undefined, unspecified or implementation-defined, but not behavior that is defined by the standard. So any example that satisfies your criteria necessarily is a program whose output is either unspecified or implementation-defined. In this case it's unspecified by the standard whether copy ctors are elided, I just happen to be lucky that GCC reliably elides them pretty much whenever allowed, but has an option to disable that.

Bothersome answered 17/8, 2011 at 18:42 Comment(8)
Nooo, wait, I really want to give the bounty to Rob, I just had to wait 24hrs with it, and I feel bad for not having done it back when I first asked the question... is that OK? I can give you bounty on some other question if you like! Copy-constructor elision is definitely one of the most interesting aspects of optimization, and that GCC option is a cool find. I would happily split the bounty if that's an option...Ultimately
Hmm, you've certainly made most of the constructive comments to this question, so you definitely deserve the moral bounty. Quite a dilemma, I wish there were a way to pay tribute rep points that are not limited to one user per question.Ultimately
@Kerrek: no worries, I'm sure you've upvoted plenty of my answers elsewhere and given me rep that way, we both hang around the C++ questions enough. Sorry, I didn't realise that you created the bounty because you had in mind to award it to Rob's answer. I just spotted the question under the "featured" tab and recognized it, so I knew I could give a bounty-hunting answer without effort. I totally don't "deserve" a bounty just for bothering to type out the code and run it to confirm the result I expected, on most questions I do that for normal upvotes :-)Bothersome
Cool, and I'm sure we'll come up with another opportunity before long :-) (I think I still have one unanswered question out, for that matter.)Ultimately
You do have an unanswered question, but I don't know the answer so that's no use. This is why I don't win bounties, I do enough hard work on my job without going beyond the low-hanging fruit and into real research and testing to answer SO questions ;-)Bothersome
Hehe, who doesn't? :-) Yeah, that question is a bit of a dud, I don't really expect a working solution from that. ("Detours" is a fine start, but I'd need to know further, disparate Windows internals to achieve my goal.) For that matter, I don't think any of the bountied questions on SO are pleasant, generally useful language questions, they're always about some ugly, overly specific three-library-plus-widget problem...Ultimately
Ah, I can award multiple bounties :-)Ultimately
I would argue that -fno-elide-constructors is not an optimization option, it's a debug option, or at a push a pessimization option! That option is only necessary for people who don't realize the standard allows copy elision (and so their program's "correctness" relies on side effects which may not happen) or who are trying to understand copy elision (and so want to experiment to see where it does/doesn't happen).Fro
L
8

For C, almost all operations are strictly defined in the abstract machine and optimizations are only allowed if the observable result is exactly that of that abstract machine. Exceptions of that rule that come to mind:

  • undefined behavior don't has to be consistent between different compiler runs or executions of the faulty code
  • floating point operations may cause different rounding
  • arguments to function calls can be evaluated in any order
  • expressions with volatile qualified type may or may not be evaluated just for their side effects
  • identical const qualified compound literals may or may be not folded into one static memory location
Leiker answered 15/6, 2011 at 21:27 Comment(0)
P
5

Anything that is Undefined Behavior according to the standard can change its behavior depending on optimization level (or moon-phase, for that matter).

Pule answered 15/6, 2011 at 21:17 Comment(4)
Yes, true. Do let us exclude anything undefined. I would like a well-defined, standard-conforming program which compiles correctly in all optimization levels and gives correct, deterministic programs but which behave differently.Ultimately
@Kerrek SB: How about unspecified behavior? Given a function call like f(a(x), b(y)), we know that a(x) is evaluated either before or after b(y), but not concurrently.Cull
@David: Ah, evaluation order, I see. Hm... is that affectable by optimisation? I'd say let's discount it because it's already non-deterministic within the standard, but if you have a working example, let's see it.Ultimately
@Kerrek: the standard doesn't say anything about optimization levels or what they mean. The only room for code to behave differently at different optimization levels, is for the implementation to make different decisions about things that the standard doesn't wholly pin down. So any example, including the copy elision one, is bound to be an example of something that's either implementation-defined, unspecified, or undefined behavior.Bothersome
C
3

Since copy constructor calls can be optimized away, even if they have side effects, having copy constructors with side-effects will cause unoptimized and optimized code to behave differently.

Cherie answered 15/6, 2011 at 21:59 Comment(1)
Yeah, that was my idea to begin with, it's just that I never managed to actually get different behaviour by varying the optimization level. Steve Jessop discovered -fno-elide-constructors though, which does the trick.Ultimately
P
2

The -fstrict-aliasing option can easily cause changes in behavior if you have two pointers to the same block of memory. This is supposed to be invalid but is actually quite common.

Pause answered 15/6, 2011 at 21:16 Comment(4)
Yeah, I figured that... did that ever happen to you in reality, though?Ultimately
@Kerrek: it shouldn't be too hard to come up with an example, I think I've managed it before. Start with float f = 0; int &a = *(int*)(&f); int b = a; f = 1; int c = a; std::cout << a << " " << b << " " << c << "\n";, then play around with it until you can get the compiler to order the operations differently at different optimization levels, and/or entirely omit one or both assignments to f at some levels but not others. The optimizer is entitled to assume that f = 1; doesn't alter the referand of a, so you just need to catch it doing so.Bothersome
@Kerrek, sorry I don't remember a specific instance. And if I did run into it, it probably wasn't with gcc. There's gotta be a reason it sticks with me though.Pause
@Steve: Interesting. No luck with that yet, but I'll try and tweak the values.Ultimately
W
2

This C program invokes undefined behavior, but does display different results in different optimization levels:

#include <stdio.h>
/*
$ for i in 0 1 2 3 4 
    do echo -n "$i: " && gcc -O$i x.c && ./a.out 
  done
0: 5
1: 5
2: 5
3: -1
4: -1
*/

void f(int a) {
  int b;
  printf("%d\n", (int)(&a-&b));
}
int main() {
 f(0);
 return 0;
}
Wife answered 17/6, 2011 at 16:21 Comment(5)
If you invoke undefined behaviour, I would say the natural thing is to expect different outcomes. Note that the difference in the output does not have to be even due to the optimization level. By invoking undefined behaviour, each run of the program is allowed to produce a different output.Hurley
Granted, and worth mentioning. But so far no one has produced an example that varies according to optimization level and doesn't require UB.Eleonoraeleonore
So the idea here is to print addresses of variables? Interesting, that's certainly something that can vary... hadn't thought of that. Cheers.Ultimately
@Kerrek - No, the idea is to print the difference of addresses of varables. Specifically, it measures whether the function had a frame pointer, or perhaps was inlined.Eleonoraeleonore
+1 for this one. Although formally it's undefined behavior and not allowed under the stricter version of the question, in practice most implementations do pretty much define the results of pointer arithmetic on objects that aren't both subobjects of some common object. So that's almost, but not quite, implementation-defined, and then the actual relative position of a and b is unspecified. Anyway, it's a fairly ingenious answer that on pretty much any implementation isn't UB for that implementation, even though it's UB for the standard.Bothersome
W
1

gcc defines __OPTIMIZE__ macro when non-zero optimization level is used. You can use it like below:

#ifdef __OPTIMIZE__
printf("Code compiled with -O1 or higher\n");
#else
printf("Code compiled with -O0\n");
#endif
Wang answered 24/6, 2022 at 9:10 Comment(0)
W
0

same source code like source code

before enable -finline-small-functions and after enable -finline-small-functions

Before enable -finline-small-functions

After enable -finline-small-functions

-finline-small-functions can be enabled in -O2/-O3

Whenever answered 10/11, 2018 at 5:14 Comment(3)
only focus on function main and _Z1Av _Z1Bv _Z1Cv functionWhenever
Firstly, the code should be included as text. (Because we can't copy code from a picture to compile it, and also because some of us rely on screen readers.)Wigwag
Can you add some explanations to you answer? I assume the second picture shows two call graphs? Is the observable behavior of the code with and without -finline-small-functions different?Wigwag
K
0

Two different C programs:

foo6.c

void p2(void);

int main() {
    p2();
    return 0;
}

bar6.c

#include <stdio.h>

char main;

void p2() {
    printf("0x%x\n", main);
}

When both modules are compiled into one excecutable with optimization levels one and zero, they print out two different values. 0x48 for -O1 and 0x55 for -O0

Screenshot of terminal

Here is an example of it working in my environment

Khanna answered 20/3, 2019 at 18:17 Comment(3)
Is it undefined behavior? I don't think undefined behavior really count. If it isn't, then nice complementary answer.Renault
@GuillaumeRacicot I think it is defined behavior, check out the screenshot I just put in the post.Khanna
I don't think this is exactly how it works. Undefined behavior is specified in the standard. Your implementation could choose to act like you expect. Or it could act differently depending on the optimization level. Or it could crash. I don't know if your example is defined or undefined behavior. Simply running it is not enough. Only the rule can say at this point.Renault
S
0

a.c:

char *f1(void) { return "hello"; }

b.c:

#include <stdio.h>

char *f1(void);

int main()
{
    if (f1() == "hello") printf("yes\n");
        else printf("no\n");
}

Output depends on whether merge string constants optimization is enabled or disabled:

$ gcc a.c b.c -o a -fno-merge-constants; ./a
no
$ gcc a.c b.c -o a -fmerge-constants; ./a
yes

Sparkle answered 21/3, 2019 at 20:28 Comment(0)
X
-1

Got some interesting example in my OS course today. We analized some software mutex that could be damaged on optimization because the compiler does not know about the parallel execution.

The compiler can reorder statements that do not operate on dependent data. As I already statet in parallelized code this dependencie is hidden for the compiler so it could break. The example I gave would lead to some hard times in debugging as the threadsafety is broken and your code behaves unpredictable because of OS-scheduling issues and concurrent access errors.

Xanthic answered 15/6, 2011 at 21:9 Comment(3)
Was it a bug or was it expected behaviour in light of the optimization steps?Ultimately
It is definitely a bug as the mutex does not work any longer. But it is not the compilers fault though as it is not designed to recognize parallel access. Its the programmers responsibility to mark this for the compiler with the keyword volatile. Seen as such I would not call it bug anymore.Xanthic
volatile isn't enough. C++0x and C1x introduce atomic types because of that.Pule

© 2022 - 2024 — McMap. All rights reserved.