Why compilers no longer optimize this UB with strict aliasing

Asked 30/12, 2015 at 10:34 Answered 6/1, 2016 at 18:44

One of the first results for strict aliasing on google is this article http://dbp-consulting.com/tutorials/StrictAliasing.html
One interesting thing I noticed is this: http://goo.gl/lPtIa5

uint32_t swaphalves(uint32_t a) {
  uint32_t acopy = a;
  uint16_t* ptr = (uint16_t*)&acopy;
  uint16_t tmp = ptr[0];
  ptr[0] = ptr[1];
  ptr[1] = tmp;
  return acopy;
}

is compiled to

swaphalves(unsigned int):
        mov     eax, edi
        ret

by GCC 4.4.7. Any compiler newer than that (4.4 is mentioned in the article so article is not wrong) does not implement the function as it could using strict aliasing. What is the reason for this? Was it in fact bug in GCC or GCC decided to drop it since many lines of code were written in a way that thay produce UB or it is just a compiler regression that lasts for years... Also Clang does not optimize it.

Traitor answered 30/12, 2015 at 10:34 Comment(19)

can you explain in more detail what is significant in the assembly output – Textual 30/12, 2015 at 10:40

There's a lot less distraction if you delete main from the example – Textual 30/12, 2015 at 10:41

clang implements the function in one instruction (rol edi, 16) ... gcc doesn't seem to know about that – Textual 30/12, 2015 at 10:43

Why worry about optimization of UB code ? – Birdman 30/12, 2015 at 11:3

I've taken the liberty of including the C function and its output in your question to hopefully make it easier to understand. Please double-check to make sure I didn't misunderstand your question. – Discovery 30/12, 2015 at 11:3

@Textual - gcc in 4.4.7 implements it as identity function that is obviously not the intent of the writer of the code. gcc does that apparently since strict aliasing allows that. I wonder why they stopped doing this optimization in newer versions. – Traitor 30/12, 2015 at 11:26

Interestingly, UB aside, these are all local variables, you would think that the compiler can easily reason about them... – Maje 30/12, 2015 at 11:26

@Birdman - a) as i said is it ub or article is wrong? b) if it is UB why did compilers stopped using SA for optimizations? It can be a big perf diff in some cases. – Traitor 30/12, 2015 at 11:28

Article explains strict-aliasing rule and specific behaviour for one implementation. Implementation may have changed to better handle other (valid) cases. They may also have identified that the code breaks SA but give correct code anyway... I would say it would be a regression if code were correct, but it not the case here. – Birdman 30/12, 2015 at 19:22

@KarolyHorvath you would - I'd speculate similiarly to OP that they deliberately don't optimize this case because there is a lot of "real" code in the wild that commits the same error but the developers don't realize it or don't want to fix the code. There's already enough people who get snotty about code like if ( !p->x || !p ) being removed – Textual 30/12, 2015 at 19:36

@M.M: I know that the linux kernel developers are really pissed about it and they have their own flags for handling it (and honestly, I don't blame them). As for the example you've provided.. it's just silly (I know, the problem sometimes doesn't look this obvious but still...) – Maje 30/12, 2015 at 19:48

How is the code compiled? – Photocopy 31/12, 2015 at 3:49

you have the flags in the link, but basically -O2/3 – Traitor 31/12, 2015 at 13:35

What do you mean by "wrt"? – Backhand 5/1, 2016 at 18:48

The only way to know is to bisect the commits until you find which one changed the behavior. Then you can read the corresponding email discussion on gcc-patches and see why it happened. – Concision 5/1, 2016 at 20:48

@Textual but what is that if intended to do? If !p, it'll crash anyway if not optimized. Or is null page accessible in the kernel, so that this code relies on this? – Dovetail 5/1, 2016 at 21:5

@Dovetail it's intended to be if ( !p || !p->x ) and the coder didn't realize there was a difference – Textual 6/1, 2016 at 6:37

Unless the implementers choose to document how they treat specific cases of undefined behavior it is hazardous to try and reason about it. UB by definition allows all behavior and allows that behavior to change between versions and that is completely valid UB. You can even find that varying the code could lead to UB being exploited in one case but not another. A lot of good reads in my answer here. Note in UB canaries link in my answer gcc has shifted its exploitation wrt to strict aliasing over time. – Slothful 11/1, 2016 at 14:21

@Jarod42: The fact that the authors of C89 did not mandate behavioral guarantees which might be difficult to honor on some platforms was never intended to suggest that platforms that could support such guarantees shouldn't do so. It is impossible to write any well-formed program which would be guaranteed not to invoke UB on an obtuse-but-compliant limitation, since it would be possible to have a pair of obtuse-but-compliant implementations such that any well-formed program that didn't invoke UB on one would be guaranteed to invoke UB on the other. – Pulmonary 12/7, 2016 at 22:47

The GCC developers put some effort into making the compiler behave "as expected" in these cases. (I wish I could give you a proper reference for this - I remember it coming up on a mailing list or somesuch at some time).

At any rate, something you say:

... does not implement the function as it could using strict aliasing

... implies perhaps a slight misunderstanding of what the strict aliasing rules are for. Your code sample invokes undefined behavior - so any compilation is technically valid, including just a plain ret or generation of a trap instruction, or even nothing at all (it's legitimate to assume the method can never be called). That newer versions of GCC produce longer/slower code is hardly a deficiency, since producing code that does any particular thing at all would not violate the standard. In fact, the newer versions improve the situation by producing code that does what the programmer probably intended the code to do instead of silently doing something differerent.

What would you rather - that the compiler produces fast code that doesn't do what you want, or slightly slower code that does do what you want?

That being said, I firmly believe that you should not write code that breaks the strict aliasing rules. Relying on the compiler doing the "right" thing when it is "obvious" what is intended is walking a tightrope. Optimisation is hard enough already, without the compiler having to guess at - and make allowances for - what the programmer intended. Further, it's possible to write code which obeys the rules and which can be turned into very efficient object code by the compiler. Indeed the further question can be raised:

Why did earlier versions of GCC behave the way they did, and "optimize" the function by relying on adherence to the strict aliasing rules?

That's a little bit complicated, but is interesting for this discussion (especially in light of suggestions that the compiler is going to some lengths just to break code). Strict aliasing is a component of (or rather, a rule which assists) a process called alias analysis. This process decides whether two pointers alias or not. There are, essentially, 3 possible conditions between any two pointers:

They MUST NOT ALIAS (the strict aliasing rule makes it easy to deduce this condition, though it can sometimes be deduced in other ways).
They MUST ALIAS (this requires analysis; value propagation might detect this condition for instance)
They MAY ALIAS. This is the default condition when neither of the other two conditions can be established.

In the case of the code in your question, strict aliasing implies a MUST NOT ALIAS condition between &acopy and ptr (it is trivial to make this determination, because the two values have incompatible types which are not allowed to alias). This condition allows for the optimisation that you then see: all the manipulation of *ptr values can be discarded because they cannot in theory effect the value of acopy and they do not otherwise escape the function (which can be determined via escape analysis).

It takes further effort to determine the MUST ALIAS condition between the two pointers. Furthermore, in doing so the compiler would need to ignore (at least temporarily) the previously ascertained MUST NOT ALIAS condition, which means it must spend time attempting to ascertain the truth of a condition which, if everything is as it should be, must be false.

When both MUST NOT ALIAS and MUST ALIAS conditions are determined, we have a case where the code must be invoking undefined behaviour (and we can issue a warning). We then have to decide which condition to keep and which to discard. Because MUST NOT ALIAS, in this case, comes from a constraint which can be (and indeed has been) broken by the user, it is the best option to discard.

So, the older versions of GCC either do not do the requisite analysis to determine the MUST ALIAS condition (perhaps because the opposite MUST NOT ALIAS condition has already been established), or alternatively, the older GCC version opts to discard the MUST ALIAS condition in preference to the MUST NOT ALIAS condition, which leads to faster code which does not do what the programmer most likely intended. In either case, it seems that the newer versions offer an improvement.

Lapse answered 5/1, 2016 at 17:57 Comment(2)

"What would you rather - that the compiler produces fast code that doesn't do what you want, or slightly slower code that does do what you want?" If what I wrote is UB, I'd rather it be the former, and I'd want the code it produces to be as wrong as possible, so that I can find and fix the UB early in testing. – Hospitalization 9/12, 2020 at 19:9

@JosephSible-ReinstateMonica I would rather than "as wrong as possible" compilers produced code which immediately errored out. But I'm afraid that would prevent a lot of software I currently used from working as intended. If that lead to such software being fixed, of course, that would be a good thing... I'm afraid there's just too much legacy code which is badly behaved, though. – Lapse 13/12, 2020 at 3:26

In this other related question, there is a comment by @DanMoulding. Let me plagiarize it:

The intent of the standard's strict aliasing rules are to allow the compiler to optimize in situations where it doesn't and cannot know whether an object is being aliased. The rules permit the optimizer to not make worst-case aliasing assumptions in those situations. However, when it is clear from the context that an object is being aliased, then the compiler should treat the object as being aliased, no matter what types are being used to access it. Doing otherwise is not in line with the intent of the language's aliasing rules.

In your code, the aliasing of *ptr and acopy is obvious, as both are local variables, so any sane compiler should treat them as aliased. From this point of view the GCC 4.4 behaviour, although in line with a strict reading of the standard, will be considered a bug by most real-world programmers.

You have to consider why there are aliasing rules in the first place. They are so that the compiler may take advantage of optimizations in situations where there might be be aliasing, but most likely there are none. So the language forbids that aliasing and the compiler is free to optimize. For example:

void foo(int *idx, float *data)
{ /* idx and data do not overlap */ }

However, when the aliasing involves local variables, there are no lost optimizations:

void foo()
{
    uint32_t x;
    uint16_t *p = (uint16_t *)&x; //x and p do overlap!
}

The compiler is trying to do its job as best as possible, not trying to find an UB somewhere to have an excuse to format your hard drive!

There are a lot of code that is technically UB but is ignored by all compilers. For example, what would you think of a compiler that treats this as an empty file:

#ifndef _FOO_H_
#define _FOO_H_
void foo(void);
#endif

Or what about a compiler that ignores this macro:

#define new DEBUG_NEW

simply because the standard allows it to do so?

Antalya answered 5/1, 2016 at 17:34 Comment(14)

Well, there was a time when "the compiler people would say 'nyaah, nyaah, the standards people said we can do this'", in Linus' own famous words. Perhaps they have been bugged long enough, or there is a generation change which often is the only thing facilitating progress? – Inculpable 5/1, 2016 at 17:45

That's a rare example of c compiler people actually deoptimizing code in the name of "nobody actually profits from this, but lots of people get caught by unexpected behavior". Actually I'm perfectly surprised, what happened to the gcc people that they're starting to make rational decisions based on real-life code instead of blindly following the standard? – Collapse 5/1, 2016 at 17:46

@PeterA.Schneider: Well, certainly times have changed. Probably the thread of CLang has something to do with it. – Antalya 5/1, 2016 at 17:49

And: I think your last #define is perfectly legal. (as opposed to #define private public; not the define as such, but the resulting code if the original contains private). – Inculpable 5/1, 2016 at 17:50

@PeterA.Schneider: As far as I remember (I may be mistaken), #defining a keyword is UB, so it would be UB in C++ but legal (not so useful) in C. – Antalya 5/1, 2016 at 17:51

True -- "A translation unit shall not #define or #undef names lexically identical to keywords,..." (17.6.4.3.1 in the 2011 std) – Inculpable 5/1, 2016 at 18:1

@PeterA.Schneider: Yes, I have just found it in the C++ std. Curiously enough, I found no such statement in the C std, only that you must not #define a keyword before #including a standard header. – Antalya 5/1, 2016 at 18:2

The dangers of DWIM. I really hope the compiler also warns about it... – Raycher 5/1, 2016 at 18:6

Regarding your header-example: Obviously, there's a _FOO_H_-define in use by the implementation or some other header then. Tough luck, and have fun figuring that one out. – Raycher 5/1, 2016 at 18:11

@Deduplicator: That's why some versions of VC++ automatically write the include guards with a GUID #define AFX_STDAFX_H__A9DB83DB_A9FD_11D0_BFD1_444553540000__INCLUDED_. Nice try, no starting underscore, but there is a double underscore, I can still format your hard drive! – Antalya 5/1, 2016 at 18:17

Where's the intent of the standard documented? – Dovetail 5/1, 2016 at 21:36

@Ruslan: In the Apocrypha ;-). – Inculpable 5/1, 2016 at 23:2

@Ruslan; The authors of the Standard published rationale documents. The authors of the Standard stated that they did not wish to break existing code, and further claimed that they regarded the most serious silent-breaking change as the mandated promotion of short unsigned types to signed int. If one presumes they were being honest about such things, they must have intended that quality compilers for target platforms and fields where programmers routinely relied upon behaviors not mandated by the Standard should continue to support those behaviors whether or not the Standard mandated them. – Pulmonary 20/5, 2017 at 5:51

@Ruslan: There is a huge difference between saying that compilers should not be required to generate code that allows for the possibility of aliasing in cases where even in the absence of rules against it there would be no evidence whatsoever to suggest that it might occur, versus saying that compilers should be willfully blind to patterns where aliasing would be obvious. – Pulmonary 20/5, 2017 at 5:53

The goal for a compiler should generally be matching the intent of the code as closely as possible. In this case, the code invokes UB, but the intent should be pretty clear. My guess is that more recently compilers have been focusing on being correct than taking advantage of UB for optimization purposes.

Strict aliasing is essentially an assumption that code isn't trying to subvert the type system, which as noted by @rodrigo, gives the compiler more information it can use to optimize. If the compiler can't assume strict aliasing, it precludes a number of non-trivial optimizations, which is why C even added a restrict qualifier (C99).

Breaking strict aliasing isn't necessary for any optimizations I can think of. In fact, in this specific case, depending on what the original intent was, you can get correct/optimized code without invoking UB...

uint32_t wswap(uint32_t ws) {
  return (ws << 16) | (ws >> 16);
}

compiles to...

wswap:                                  # @wswap
    .cfi_startproc
# BB#0:
    roll    $16, %edi
    movl    %edi, %eax
    retq

Heliograph answered 6/1, 2016 at 18:44 Comment(3)

alot better than before. +1. on a side note: materialization is one of the optimizations that take advantage of strict aliasing. if you can proof a segment of memory was not changed you don't have to reload the value from memory but can use a cached value (from a register) or compute it again. strict aliasing was meant to help reduce the cost of memory analysis. – Wilhoit 6/1, 2016 at 19:16

I don't like the phraseology "do what i mean". Better would be to say that rules which exist for the sole purpose of allowing optimization and do not define any useful meaning, should be interpreted in such a fashion as to avoid breaking code which would be meaningful and useful in their absence. Pessimistically regarding all casts from T* to U* as potential accesses of type T except in cases where they could be proven not to be even without using aliasing rules would not block many useful optimizations, but would instead reduce the fraction of useful programs... – Pulmonary 20/5, 2017 at 5:57

...which would otherwise require disabling all type-based aliasing "optimizations". – Pulmonary 20/5, 2017 at 5:57

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags