Is it legal for a C++ optimizer to reorder calls to clock()?
Asked Answered
V

7

76

The C++ Programming Language 4th edition, page 225 reads: A compiler may reorder code to improve performance as long as the result is identical to that of the simple order of execution. Some compilers, e.g. Visual C++ in release mode, will reorder this code:

#include <time.h>
...
auto t0 = clock();
auto r  = veryLongComputation();
auto t1 = clock();

std::cout << r << "  time: " << t1-t0 << endl;

into this form:

auto t0 = clock();
auto t1 = clock();
auto r  = veryLongComputation();

std::cout << r << "  time: " << t1-t0 << endl;

which guarantees different result than original code (zero vs. greater than zero time reported). See my other question for detailed example. Is this behavior compliant with the C++ standard?

Virtuoso answered 4/10, 2014 at 6:37 Comment(22)
No it is not. The compiler should expect functions to have some side effects. Some compilers have language extensions for "pure constant" functions. You could ask your compiler to show the assembler code (e.g. g++ -O2 -S -fverbose-asm your-code.cc with GCC...)Burstone
"which guarantees different result than original code (zero vs. greater than zero time reported)" -- It doesn't guarantee that, as far as the standard is concerned, since the standard makes no mention of how long any particular operation should take (aside from calls to sleep functions). The call to veryLongComputation() could very well be instantaneous.Tael
@BenjaminLindley The call to veryLongComputation() could very well be instantaneous - I disagree. There is great number of algorithms, which given sufficiently large data are guaranteed not to complete before the end of solar system using any of computing hardware known at the time C++ standard was written.Virtuoso
You could prevent this reordering (if we end up deciding that it is legal) by making r volatile.Ahmed
@MattMcNabb volatile was the solution suggested in the question he referenced.Lemmuela
@PaulJurczak: I fail to see how what you said disagrees with what I said. However, perhaps this will be an interesting read for you: stackoverflow.com/questions/3592557/…Tael
@BenjaminLindley It is an interesting read, but it illustrates a special rule for an empty, infinite loop. My case involves a function call with I/O side effect. I agree that standard doesn't define time cost of computations, but it defines the concept of system clock and related utilities. Additionally, it uses big O algorithmic complexity in many places (see page 894). Executing my code example sequentially will produce arbitrarily large number, depending on my choice of algorithm, which can always be much bigger than the one produced by executing reordered version of the code.Virtuoso
@PaulJurczak: It's not a special rule for an infinite empty loop. It's a rule for any loop that satisfies certain conditions (which are listed in the question). Does your function call involve an I/O side effect? You didn't mention that in the question.Tael
@BenjaminLindley Call to veryLongComputation returns a value, which is sent to cout - I thought it qualified as an I/O.Virtuoso
@Paul what qualifies as I/O is the sending-to-cout part, not the computing-r part. What you are seeing is perfectly compliant. Whatever non-negative difference you see between the clock calls, possibly small enough to be rounded to 0, is fine. Imagine a program where the difference between the 2 calls to clock is such that the you get 0 or 1 with roughly 50% probability, depending on when you start and what else is running. Should optimizations preserve this exact probability?Luker
@MarcGlisse Should optimizations preserve this exact probability - I agree that proper treatment of time as a side effect is a complex issue, but it can't be swept under the rug as if it doesn't exist. This behavior is user unfriendly to say the least. The intent expressed in the code is obvious and compiler contravenes it. It is not difficult for optimizer to figure out that value of t1-t0 depends on code order and leave it alone. That's what gcc and clang does. VC++ does it for some functions and reorders for some others without obvious pattern.Virtuoso
@Paul "that's what gcc and clang do". Is it? I believe that if they don't move code across clock, it isn't because they think it would be wrong, it is simply that their optimization heuristics didn't find any particular reason to do it in this particular testcase, but they would happily do it on small variations. Benchmarking is not the main goal of the language, so it isn't surprising that it requires some extra work.Luker
How could you tell the difference between the two versions from the standpoint of the abstract machine? There is no guarantee whatsoever how long a computation will take. (Practical example: the compiler could fully evaluate your computation.). This reordering would be invalid if your computation performed a side-effect with a guaranteed minimum duration such as a sleep. An IO is not enough because, again, you cannot tell the difference between the reordering and a really fast IO. The as-if rule applies.Fabyola
This is horrible! Is there any way to prevent compiler from re-ordering certain calls or skipping seemingly redundant ones (such as the 2nd call to clock())? There is a volatile keyword for variables to prevent compiler from "optimizing" variable access; is there an analogue of volatile for functions?Salespeople
@Salespeople volatile does not prevent reorderings. It prevents reorderings among volatile variables, but other computations can be rearranged. volatile is meant for side-effecting memory addresses.Fabyola
@Fabyola compiler could fully evaluate your computation - this is the best explanation/excuse for this behavior presented so far, thanks.Virtuoso
@MarcGlisse Benchmarking is not the main goal of the language I think that misses the point. It may be benchmarking in this case, it might be auto-tuning in another. Assuming benchmarking and then dismissing it as unimportant violates the principle of least surpriseLevi
@Basic: It's not very surprising at all, really. Neither function depends on any side effects of the other. They're candidates for reordering. What should disqualify them? How should the compiler know to keep the same order here, as opposed to everywhere else? Or would you propose outlawing a whole very useful class of optimizations so that benchmarks work better?Brashear
@Brashear outlawing a whole very useful class of optimizations so that benchmarks work better that's far from what he's suggesting. Only a small number of cases would be affected by not reordering code across clock() call. This would require a minor constraint on optimizer, easy to implement, as there are already constraints of this sort imposed for other reasons. This would remove violation of principle of least surprise. Better yet, make C++ abstract machine aware of non-zero time cost of non-elided computations.Virtuoso
@PaulJurczak: The constraints imposed are mostly, if not entirely, to keep code from breaking the rules about observable behavior. There's no such violation here, and the only reason it is even an issue is because you're timing something. How is the compiler supposed to know that? Special knowledge about clock() can only do so much -- there are probably a half dozen functions just in Win32 to get the time in various formats and precisions...and the built-in inability to use one to write one's own equivalent of clock() would be pretty freaking surprising too.Brashear
About the only worthwhile idea would be some keyword, pragma, or the like to disable certain optimizations in a section of code. Other solutions are too prone to weirdness, IMO.Brashear
And in fact, there kinda sorta almost is such a thing. #pragma optimize("", off) allegedly disables all optimizations in the function(s) following it. You can reenable them with #pragma optimize("", on).Brashear
J
17

The compiler cannot exchange the two clock calls. t1 must be set after t0. Both calls are observable side effects. The compiler may reorder anything between those observable effects, and even over an observable side effect, as long as the observations are consistent with possible observations of an abstract machine.

Since the C++ abstract machine is not formally restricted to finite speeds, it could execute veryLongComputation() in zero time. Execution time itself is not defined as an observable effect. Real implementations may match that.

Mind you, a lot of this answer depends on the C++ standard not imposing restrictions on compilers.

July answered 4/10, 2014 at 17:43 Comment(3)
Assignments to non-volatile variables are not observable side effects.I
Should have been more explicit: both calls are observable side effects.July
Here is a "how to prevent reordering question": stackoverflow.com/questions/37786547/…Roy
G
19

Well, there is something called Subclause 5.1.2.3 of the C Standard [ISO/IEC 9899:2011] which states:

In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

Therefore I really suspect that this behaviour - the one you described - is compliant with the standard.

Furthermore - the reorganization indeed has an impact on the computation result, but if you look at it from compiler perspective - it lives in the int main() world and when doing time measurements - it peeps out, asks the kernel to give it the current time, and goes back into the main world where the actual time of the outside world doesn't really matter. The clock() itself won't affect the program and variables and program behaviour won't affect that clock() function.

The clocks values are used to calculate difference between them - that is what you asked for. If there is something going on, between the two measuring, is not relevant from compilers perspective since what you asked for was clock difference and the code between the measuring won't affect the measuring as a process.

This however doesn't change the fact that the described behaviour is very unpleasant.

Even though inaccurate measurements are unpleasant, it could get much worse and even dangerous.

Consider the following code taken from this site:

void GetData(char *MFAddr) {
    char pwd[64];
    if (GetPasswordFromUser(pwd, sizeof(pwd))) {
        if (ConnectToMainframe(MFAddr, pwd)) {
              // Interaction with mainframe
        }
    }
    memset(pwd, 0, sizeof(pwd));
}

When compiled normally, everything is OK, but if optimizations are applied, the memset call will be optimized out which may result in a serious security flaw. Why does it get optimized out? It is very simple; the compiler again thinks in its main() world and considers the memset to be a dead store since the variable pwd is not used afterwards and won't affect the program itself.

Grouping answered 4/10, 2014 at 7:51 Comment(10)
On the other hand it actually does affect the function. If it is called on a different place it returns a different value. And if that's not a side effect caused by the long computation I don't know what is. "code between the measuring won't affect the measuring as a process" that is true. But it very much affects the values. It's like saying "The value of variables does not change summation as a process".Accused
I think that summation is not the same. Variables you are summing are pretty much exact and deterministic. The clock time, on the other hand, is not. Actually result of time difference measuring is pretty much random in some sense...Grouping
The clock time is random in the sense that it can vary from execution to execution. However it's also definitely not so random that we could just swap it with rand() calls. And it's quite easy to prove that "no needed side effects are produced" doesn't apply because there is a clear side-effect: The time difference is larger.Accused
C11 standard 5.1.2.3:2 states the things which are to be counted as a side effect. Accessing a volatile object is one of them. A clock function eventually either uses volatile (otherwise the compiler can just optimize it into a single read, thus time never changes) or it's handwritten asm which effectively does the same thing. Or it's a syscall (we can say it's equivalent to a function call) and eventually the kernel does the exact same thing, basically reads a volatile object. Unless the compiler can prove there is no access to a volatile ever in that chain it cannot change the order.Accused
@Accused "Unless the compiler can prove [...]" but that's exactly what the compiler is doing here. If you hide the body of veryLongComputation from it, I strongly doubt it reorders. But here it looks at veryLongComputation in details, proves that there are no side effects in it, and then reorders. And no, a different running time (waving arms to say it is very different) does not count as a side effect.Luker
memset will of course not be optimized out in the example if pwd is volatile. IOW, if you want the password clearing to be observable, you must make it so.July
@MSalters: If pwd is volatile, passing it to memset has undefined behavior. You cannot access a volatile object through a non-volatile-qualified lvalue, which is what memset would do.Mignon
Point. std::fill would work. A volatile pointer is still an iterator. So much for C++ being less predictable than C.July
"when compiled normally" is not how I'd describe an un-optimized build. Optimization is normal; you only make a braindead anti-optimized build with gcc -O0 or something when you need fully consistent debugging. gcc -O0 never keeps anything in registers between statements so you can modify variables at run-time with a debugger, or even jump to a new source line in GDB and have the program work as if you'd jumped in the C abstract machine. These extra stores/reloads have no purpose except consistent debugging, so it's far more than just "not optimizing".Funnel
But AFAIK there's not much middle ground where it keeps things in registers but doesn't throw away "dead stores". Maybe with gcc -Og ("optimize for debugging"). Not that it matters, because you want to compile with full optimization for production use.Funnel
J
17

The compiler cannot exchange the two clock calls. t1 must be set after t0. Both calls are observable side effects. The compiler may reorder anything between those observable effects, and even over an observable side effect, as long as the observations are consistent with possible observations of an abstract machine.

Since the C++ abstract machine is not formally restricted to finite speeds, it could execute veryLongComputation() in zero time. Execution time itself is not defined as an observable effect. Real implementations may match that.

Mind you, a lot of this answer depends on the C++ standard not imposing restrictions on compilers.

July answered 4/10, 2014 at 17:43 Comment(3)
Assignments to non-volatile variables are not observable side effects.I
Should have been more explicit: both calls are observable side effects.July
Here is a "how to prevent reordering question": stackoverflow.com/questions/37786547/…Roy
C
8

Yes, it is legal - if the compiler can see the entirety of the code that occurs between the clock() calls.

Charmainecharmane answered 4/10, 2014 at 7:13 Comment(9)
Do you have a standard quote or reference that corroborates this?Glennglenna
@remyabel It's allowed by the quote in the original question. The key is recognizing that the call to clock is not what's being reordered - it's the computation that is. By the as-if rule, any transparent computation can be computed at any time.Charmainecharmane
More to the point, if the compiler can guarantee that veryLongComputation() causes no observable behaviour.Ahmed
@Charmainecharmane It's allowed by the quote in the original question - but the result is different, which is a common sense contradiction to that quote. Are time related quantities exempt from this rule?Virtuoso
@MattMcNabb if the compiler can guarantee that veryLongComputation() causes no observable behaviour - so computation time is not an observable quantity?Virtuoso
@PaulJurczak that's correct; observable behaviour is writes of volatile variables, and calls to library functionsAhmed
@MattMcNabb: I don't even see where calls to library functions count. The minimum requirements listed in C++11 are accesses to volatile variables, file I/O, and that in interactive cases, that output is flushed before input can begin.Brashear
@Brashear any call to a function that is not fully seen must be assumed to be that though.Charmainecharmane
@o11c: "Must" is a pretty strong word. :) All the compiler would theoretically need in order to optimize a library call away, is the code of the library being used (and, of course, proof that the call wouldn't affect observable behavior). The biggest reason it doesn't happen in most cases, is that it's pretty complicated to do.Brashear
Z
4

If veryLongComputation() internally performs any opaque function call, then no, because the compiler cannot guarantee that its side effects would be interchangeable with those of clock().

Otherwise, yes, it is interchangeable.
This is the price you pay for using a language in which time isn't a first-class entity.

Note that memory allocation (such as new) can fall in this category, as allocation function can be defined in a different translation unit and not compiled until the current translation unit is already compiled. So, if you merely allocate memory, the compiler is forced to treat the allocation and deallocation as worst-case barriers for everything -- clock(), memory barriers, and everything else -- unless it already has the code for the memory allocator and can prove that this is not necessary. In practice I don't think any compiler actually looks at the allocator code to try to prove this, so these types of function calls serve as barriers in practice.

Zela answered 4/10, 2014 at 20:53 Comment(2)
Why is functions not being first-class entities the problem?Picky
@harold: I said nothing about functions being first-class entities.Zela
L
2

At least by my reading, no, this is not allowed. The requirement from the standard is (§1.9/14):

Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.

The degree to which the compiler is free to reorder beyond that is defined by the "as-if" rule (§1.9/1):

This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

That leaves the question of whether the behavior in question (the output written by cout) is officially observable behavior. The short answer is that yes, it is (§1.9/8):

The least requirements on a conforming implementation are:
[...]
— At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.

At least as I read it, that means the calls to clock could be rearranged compared to the execution of your long computation if and only if it still produced identical output to executing the calls in order.

If, however, you wanted to take extra steps to ensure correct behavior, you could take advantage of one other provision (also §1.9/8):

— Access to volatile objects are evaluated strictly according to the rules of the abstract machine.

To take advantage of this, you'd modify your code slightly to become something like:

auto volatile t0 = clock();
auto volatile r  = veryLongComputation();
auto volatile t1 = clock();

Now, instead of having to base the conclusion on three separate sections of the standard, and still having only a fairly certain answer, we can look at exactly one sentence, and have an absolutely certain answer--with this code, re-ordering uses of clock vs., the long computation is clearly prohibited.

Lindell answered 4/10, 2014 at 15:7 Comment(24)
By this logic, even optimising 1 + 1 to 2 is invalid, because code might be timing the calculation. I don't think the logic is correct.Hymanhymen
You seem to be reading incorrectly. "Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated." That says precisely nothing about constant folding.Lindell
Your answer claims that if code is timing the calculation, and printing the time, the precise printed time must not change because of optimisations, because the precise printed time is observable.Hymanhymen
Not so. My answer claims that it can make some changes, such as changing the order or evaluation of subexpressions within an expression, but not others such as the order of evaluation of full expressions.Lindell
This just doesn't make sense. The abstract semantics allow any amount of time to elapse for any statement (short of a sleep of course). The abstract machine sets the range of allowed outcomes, not some "unoptimized" build.July
@MSalters: Who said anything about an "'unoptimized' build?"Lindell
Unoptimized in the sense that no calls have been reordered, which is an action performed by the optimizer.July
@MSalters: So your position is that: "Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated." ...is basically non-binding advice, not really a requirement at all.Lindell
@JerryCoffin That's merely what makes i = 1; i = 2; have the net effect of setting i to 2, instead of allowing the two stores to overlap and setting i to some garbage value.Hymanhymen
@hvd: Quite the contrary--it's much more than that. It means that when you call clock it has to give you the value from the clock after preceding full expressions, and before subsequent ones.Lindell
@JerryCoffin: You still have to show that there's a link between order of observable effects and sequencing. While that may appear to be intuitively right, nobody has come up with the relevant standardese proving that.July
@MSalters: No, I don't. You seem to be mistaking "side effects" in §1.9/14 with "observable side effects" in §1.9/1 and §1.9/8. As you seem to be interpreting things, clock_t clock() { return (clock_t)rand(); } would apparently be a legitimate definition of clock(), because there's no circumstance under which you could depend on its output being predictable at all.Lindell
As explained in the comments of the earlier question, your final fix (adding volatile on each line) is not sufficient. auto volatile r=fun() is equivalent to auto tmp=fun(); auto volatile r=tmp; and the compiler may move the first part before clock.Luker
@MarcGlisse: Citation needed.Lindell
@JerryCoffin: My reading of C++11 §1.9/1 (and the intent as well, considering the footnote) is that the compiler can ignore every rule in the standard except for §1.9/8 (ie: as long as observable behavior is maintained). Since veryLongComputation has no side effects that are visible to clock (and vice versa), sequencing need not be respected -- the compiler can call functions wherever it pleases -- as long as any I/O, and the reads/writes to t0, r, and t1, happen exactly as the abstract machine could have done them.Brashear
@cHao: So that puts us back at clock being equivalent to rand()--all output from it is basically random numbers.Lindell
@JerryCoffin: Inasmuch as it relates to timing the execution of code with no visible side effects? Basically, yeah. Can't be trusted.Brashear
IMO, that leads to only one reasonable conclusion: you're overreading the permissiveness of the "as-if" rule to allow a great deal that it really doesn't, and was never intended to either. Don't get me wrong: games like this can be fun, and 20 years ago (or so) I'd probably have been playing along. I won't try to say whether my growing reluctance is wisdom or boredom, but they don't strike me as nearly as useful or fun any more.Lindell
I'm reading it as written -- and as expressly stated in the footnotes, which read: "This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program."Brashear
@cHao: I don't think you are. It seems clear to me that in this case the observable behavior is affected.Lindell
It's not, though. From C++'s perspective, time is not observable behavior. If veryLongComputation actually did something -- say, if it wrote to a volatile global variable itself -- you could more or less ensure that stuff ran in the sequence it shows in the code. But since it doesn't actually affect anything, C++ is free to run it whenever it's convenient and write the result later.Brashear
As for the language-lawyer games, i'll admit i'm having fun looking for loopholes. :) But questions like this one are precisely why you pretty much have to read the standard with a lawyer mindset.Brashear
@cHao: "And yet it moves."Lindell
@JerryCoffin At any rate, this isn't a typical language lawyer issue: this isn't about wording where some hypothetical implementation might possibly misuse the improper wording, this is where a real-world compiler optimises based on the exact wording of the standard, and where pretty much every optimising compiler there is has similar optimisations even if they don't show for this particular test program. Even if you conclude (whether rightly or wrongly) that the standard doesn't intend to allow these optimisations, code that expects to be portable to current implementations must deal with it.Hymanhymen
B
1

Let's suppose that the sequence is in a loop, and the veryLongComputation () randomly throws an exception. Then how many t0s and t1s will be calculated? Does it pre-calculate the random variables and reorder based on the precalculation - sometimes reordering and sometimes not?

Is the compiler smart enough to know that just a memory read is a read from shared memory. The read is a measure of how far the control rods have moved in a nuclear reactor. The clock calls are used to control the speed at which they are moved.

Or maybe the timing is controlling the grinding of a Hubble telescope mirror. LOL

Moving clock calls around seems too dangerous to leave to the decisions of compiler writers. So if it is legal, perhaps the standard is flawed.

IMO.

Bonnibelle answered 8/10, 2014 at 3:1 Comment(1)
If the timing is important, you should be using a more trustworthy mechanism than timing do-nothing code. And/or disable all optimizations -- most of what the optimizer does involves reordering instructions.Brashear
S
-1

It is certainly not allowed, since it changes, as you have noted, the observeable behavior (different output) of the program (I won't go into the hypothetical case that veryLongComputation() might not consume any measurable time -- given the function's name, is presumably not the case. But even if that was the case, it wouldn't really matter). You wouldn't expect that it is allowable to reorder fopen and fwrite, would you.

Both t0 and t1 are used in outputting t1-t0. Therefore, the initializer expressions for both t0 and t1 must be executed, and doing so must follow all standard rules. The result of the function is used, so it is not possible to optimize out the function call, though it doesn't directly depend on t1 or vice versa, so one might naively be inclined to think that it's legal to move it around, why not. Maybe after the initialization of t1, which doesn't depend on the calculation?
Indirectly, however, the result of t1 does of course depend on side effects by veryLongComputation() (notably the computation taking time, if nothing else), which is exactly one of the reasons that there exist such a thing as "sequence point".

There are three "end of expression" sequence points (plus three "end of function" and "end of initializer" SPs), and at every sequence point it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed.
There is no way you can keep this promise if you move around the three statements, since the possible side effects of all functions called are not known. The compiler is only allowed to optimize if it can guarantee that it will keep the promise up. It can't, since the library functions are opaque, their code isn't available (nor is the code within veryLongComputation, necessarily known in that translation unit).

Compilers do however sometimes have "special knowledge" about library functions, such as some functions will not return or may return twice (think exit or setjmp).
However, since every non-empty, non-trivial function (and veryLongComputation is quite non-trivial from its name) will consume time, a compiler having "special knowledge" about the otherwise opaque clock library function would in fact have to be explicitly disallowed from reordering calls around this one, knowing that doing so not only may, but will affect the results.

Now the interesting question is why does the compiler do this anyway? I can think of two possibilities. Maybe your code triggers a "looks like benchmark" heuristic and the compiler is trying to cheat, who knows. It wouldn't be the first time (think SPEC2000/179.art, or SunSpider for two historic examples). The other possibility would be that somewhere inside veryLongComputation(), you inadvertedly invoke undefined behavior. In that case, the compiler's behavior would even be legal.

Spermous answered 4/10, 2014 at 16:3 Comment(4)
I think you're mistakingly assuming that MSVC does not see the body of veryLongComputation. MSVC has Link Time Code Generation, which essentially means it can see that body.July
@MSalters: That may be so, but it is irrelevant insofar as it does not see the code of clock. In order for the optimization to be legal, it must be able to prove that there are no side effects occurring in any of the functions, and it can't do that (since it needs to see each and every function for that). Also, it is necessary that the observable behavior is identical. Which, too, is demonstrably not the case.Spermous
The call to clock is observable (all library calls are). No need to prove anything further. And no, the behavior does not need to be identical to the non-optimized version. The standard merely requires that the outcome is identical to one of the possible outcomes of the abstract machine, and running within one unit of time_t is definitely possible and thus a valid outcome.July
Nitpick: Library I/O calls are side effects. (§1.9/12) Library calls that don't do I/O aren't. And even the I/O functions don't necessarily have to be called, unless the call affects observable behavior.Brashear

© 2022 - 2024 — McMap. All rights reserved.