Why doesn't GCC and Clang do this aliasing-optimization?

Asked 19/6, 2013 at 14:14 Answered 3/5, 2017 at 16:21

c++gcc optimization compiler-optimization strict-aliasing

I have a case where a friend casts a non-base class object of type "Base" to a class type object "Derived", where "Derived" is a derived class of "Base" and only adds functions, but no data. In the below code, I did add a data member x to the derived class

struct A {
  int a;
};

struct B : A {
  // int x;
  int x;
};

A a;

int g(B *b) {
   a.a = 10;
   b->a++;
   return a.a;
}

With strict alias analysis on, GCC (also Clang) always returns 10, and not 11, because b can never point to a in a well-defined code. However, if I remove B::x (as is actually the case in the code of my friend), GCC's output assembler code does not optimize the return access of a.a and reloads the value from memory. So the code of my friend that calls g "works" on GCC (as he intended) even though I think it still has undefined behavior

g((B*)&a);

So in essentially the same two cases, GCC optimizes one case and doesn't optimize the other case. Is it because b can then legally point to a? Or is it because GCC just wants to not break real-world code?

I tested the answer that states

If you remove B::x, then B meets the requirements in 9p7 for a standard-layout class and the access becomes perfectly well-defined because the two types are layout-compatible, 9.2p17.

With two layout compatible enums

enum A : int { X, Y };
enum B : int { Z };

A a;

int g(B *b) {
   a = Y;
   *b = Z;
   return a;
}

The assembler output for g returns 1, not 0, even though A and B are layout compatible (7.2p8).

So my further question is (quoting an answer): "two classes with exactly the same layout may be considered "almost the same" and they are left out of the optimization.". Can someone provide a proof of this for GCC or Clang?

Mindszenty answered 19/6, 2013 at 14:14 Comment(29)

Where is g() called? With g(&a)? I don't think using a global and pointer to the same global variable within the same function is anything but "undefined behaviour". And as we know, undefined behaviour can lead to all manner of things, including "what you expected". – Rumery 19/6, 2013 at 14:21

@MatsPetersson g is called like g((B*)&a). In my test snippet, it is not called (I only needed the assembler output for g) – Mindszenty 19/6, 2013 at 14:23

Do you have "chapter and verse" on it being undefined behaviour? – Abhor 19/6, 2013 at 14:23

Can you elaborate on the return access elision, eg by posting relevant assembly outputs? – Annulet 19/6, 2013 at 14:28

@doctorlove: It's accessing an object via an expression of incompatible type. a may be accessed via expression of type A, but not type B. – Gessner 19/6, 2013 at 14:28

Can you clarify that (whether?) your question is spcifically asking why g++ behaves the way it does in the face of this undefined behavior? In other words, is this a compiler question or a language question? – Uboat 19/6, 2013 at 14:28

@MarkB it's both, in a way. if you find out that it is not undefined behavior, I'm all eager to hear it. – Mindszenty 19/6, 2013 at 14:32

Your question should be directed at the GCC mailing list. Why does some implementation do this in an undefined behavior case? Because the implementation's details add up to this through some internal representation. Also, if this is a GCC-ism it might disappear by using -std=c++11 instead of -std=gnu++11. – Downwards 19/6, 2013 at 14:39

@Abhor my understanding is that none of the bullets of 3.10p10 match – Mindszenty 19/6, 2013 at 14:39

I dislike the mentality of some when questions about undefined behavior arise. Undefined behavior does not mean that compiler developers shut off their brain and play dice. And it neither means that they don't care. – Mindszenty 19/6, 2013 at 14:41

@Johannes But why they should care? – Hectic 19/6, 2013 at 14:48

I agree, 3.10p10 was what I found - unfortunately, my graphics driver decided to stop working. And I agree, undefined just means that it's up to the compiler vendor to do something as meaningful as possible - but some situations are terribly hard to do something with, and for the compiler to "realize" that a.a and b->a are the same here [or not realize, but play it safe], they would have to store and load the data unnecessarily, just in case your b object that isn't supposed to be a pointer to an a object is actually the same a.a. People prefer fast code for when it is written correctly. – Rumery 19/6, 2013 at 14:48

@MatsPetersson i am sorry, I think i had a confusing wording in my question. I reworded it, and hope it becomes clearer. Of course the function itself has no undefined behavior. – Mindszenty 19/6, 2013 at 15:1

Maybe because the two cases end up in two different paths in "analyze for possible aliasing", and the compiler thinks "well, these objects have the same structure, so could be that they are actually used interchangably", where "in this case, clearly B can't he used for an A object, so I'll we don't need to 'fix' this". Now, I didn't write a single line in gcc/g++ (I did send a small patch to someone working on the x86 variant, but I think it was reworked), so I'm just surmising the logic involved. – Rumery 19/6, 2013 at 15:6

@JohannesSchaub-litb Undefined behavior does mean that some compiler writers (at least with g++) will try to make the code fail at runtime. (I don't think that this is an issue here, but I've seen it in other cases.) – Broch 19/6, 2013 at 15:11

@JohannesSchaub-litb With regards to why the compiler does two different things: Ben Voigt pointed out one possible reason in his response. I'm not sure I agree with the logic, but the developers of g++ might. In which case, if B is layout compatible with A, they consider that b in g might alias global a, and if it isn't they don't. (And of course, since their is clearly undefined behavior in C++03, they don't need to do this conditionally.) – Broch 19/6, 2013 at 15:15

@Mats: In the latter case, where B has standard-layout, it's allowed to alias things, and therefore the compiler has to actually generate accesses. – Blastula 19/6, 2013 at 15:25

@BenVoigt: So, indeed, it takes two different paths down the "alias analyzer" based on "being same layout" vs. "not the same layout". Good to know. – Rumery 19/6, 2013 at 15:28

Asking why an optimization has not been applied is a complicated question. The basic answer is that the implementation of that particular compiler does not perform that optimization in that particular case, it could be that the heuristics did not detect this, or that it was never considered or... The more interesting question is just the opposite: is it guaranteed that it will always reload from memory? And I don't believe that to be the case, but I think you agree with me here, right? – Salim 30/6, 2013 at 20:38

@JohannesSchaub-litb: Have you reproduced this while preventing the inlining of g? If not, try doing that. (One way to do it is to store g's address in a volatile function pointer variable, then call it from there.) – Billiards 1/8, 2013 at 4:21

@Mehrdad in my testcase, I did not call g. I have just compiled it separately without a main function. – Mindszenty 4/2, 2017 at 19:17

@JohannesSchaub-litb With which compiler versions is this problem happening? '13 was four years ago, after all. – Ics 6/2, 2017 at 0:26

I tested it on GCC, clang, ICC, all produces the same assembly, as describes by JohannesSchaub-litb. I also try different versions of these compilers. They all 3 agree. If B does not declare any non static data member or any virtual function, then the compiler supposes that *b can be an alias for a. Maybe an early optimization replaces derived class that does not add any data (including vptr) by the base? – Mold 6/2, 2017 at 22:55

@Mold it might be case of two struct\classes being different by last members only as detected by compiler. E..g. it is legal to use a pair such in union and access shared members of each, regardless which struct was written last. We may see strict-aliasing failing because compiler had met an edge case – Epicycloid 7/2, 2017 at 6:34

@Swift, I made new tests, If A is polymorphic, and B derives from A, whatever you declare inside B, the compiler never do strict aliasing optimization, whatever the compiler and whatever its version. So this is definitively not related with deepness of layout compatibility. I have read your answer. I am still not 100% satisfied, I was also expecting an answer to: Why all compilers seem that pessimistic when performing strict aliasing optimization? – Mold 7/2, 2017 at 9:58

Great question because it's unique and interesting. To get a good answer, I think you should post it on the gcc mailing list by sending an email to [email protected]. Then if you get a satisfying answer from them, you can post it here. – Bennington 9/2, 2017 at 13:27

@Oliv, was the field in question introduced in A or in its base class? – Epicycloid 10/2, 2017 at 5:45

@Swift, You will find the code by following the link below (I hope this link will work when you click on it!). There are to function g and gr. gr pointer argument is "restricted" which forces strict aliasing optimization, you can thus compare the assembly generated with or without strict aliasing: [godbolt.org/g/JQvuOP] – Mold 10/2, 2017 at 9:54

@Mold I don't have enough time currently, but I just checked your code using clang++ 4. The difference in the translation occurs in the InstCombine-Transformation (you can compare ./opt test.bc -S against ./opt -instcombine test.bc -S). This step removes the bitcast and the load and directly starts returning 10. You will find this file in the clang source lib/Transforms/InstCombine/InstructionCombining.cpp. In the few seconds I looked at the file I could not find the place where aliasing matters. – Fax 10/2, 2017 at 19:43

If you remove B::x, then B meets the requirements in 9p7 for a standard-layout class and the access becomes perfectly well-defined because the two types are layout-compatible, 9.2p17 and the members both have the same type.

A standard-layout class is a class that:

has no non-static data members of type non-standard-layout class (or array of such types) or reference,

has no virtual functions (10.3) and no virtual base classes (10.1),

has the same access control (Clause 11) for all non-static data members,

has no non-standard-layout base classes,

either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and

has no base classes of the same type as the first non-static data member.

Two standard-layout struct types are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in declaration order) have layout-compatible types.

Blastula answered 19/6, 2013 at 14:57 Comment(19)

Oh wait, 9.2p19 isn't the right clause because it talks about unions. But the rule IS there. Still looking. – Blastula 19/6, 2013 at 15:0

Ok, I think 9.2p17 and 5.2.10p7 are relevant here. – Blastula 19/6, 2013 at 15:6

OK they are layout compatible, but that still does not seem to allow to access one using an lvalue of another. 3.10p10 does not mention layout compatibility. Is there anywhere else something related to layout compatibility that makes it defined? – Mindszenty 19/6, 2013 at 15:7

That's C++11. In C++03, the behavior is undefined. (But of course, one of the reasons the standard authors accepted defining it is because it actually worked in all implementations.) – Broch 19/6, 2013 at 15:9

@James: But I believe this question concerns Standard C++, which is to say, C++11. That's the assumption unless another version is specifically mentioned. – Blastula 19/6, 2013 at 15:18

@JohannesSchaub-litb: You're accessing an object whose dynamic type is int via an lvalue of type int& via a structure type that contains an int. 3.10p10 says that is ok. – Blastula 19/6, 2013 at 15:23

@BenVoigt I don't know. He specifically asked why g++ might be doing different things in the two cases. And g++ doesn't really implement C++11 yet (although this effect might be due to the experimental preliminary implementation). – Broch 19/6, 2013 at 15:23

@James: I answered why the behavior is different. When B has non-static data members, it ceases to be a standard-layout class and therefore is no longer layout-compatible with A or anything else. Which permits greater optimization. – Blastula 19/6, 2013 at 15:24

@BenVoigt I made a test with enums, and GCC does alias optimize it. Question edited. – Mindszenty 19/6, 2013 at 15:32

@JohannesSchaub-litb: Edited my answer to explicitly state the requirement that both members have the same type (which is required by the strict aliasing rule). I thought that was obvious, and I referred to it in my prior comment. – Blastula 19/6, 2013 at 16:44

@BenVoigt then what difference is left compared to the enum case? We have two different types in both cases (B and A in one case are different classes and in another case are different enums). The difference does not matter because they are layout compatible (according to you, ?). Then in the struct case we must additionally check that the member has the same type. In the enum case, we do not have another type to check. So if one of these works, I would expect the enum case to work. Why is it the other way around? – Mindszenty 19/6, 2013 at 16:48

@BenVoigt that bullet is not intended to apply to such cases. Otherwise, you could use an arbitrary struct to access the int if only the struct would also contain that int. That bullet is intended to prevent caching of a whole struct if you modify one of its member by using a pointer to that member (and vice versa). See the C rationale document. – Mindszenty 19/6, 2013 at 17:54

Even so, let's assume that bullet applies, but then layout compatibility would be entirely irrelevant - after all the struct contains an int member! – Mindszenty 19/6, 2013 at 18:2

@JohannesSchaub-litb: Strict aliasing requires that you use the type (here int&) that matches the object in the location. Layout compatibility guarantees that you're looking at the right location, i.e. p->*&A::x and p->*&B::x resolve to the same object, same alignment. In the enum case, layout compatibility again guarantees the right location and right alignment, but you have the wrong type. – Blastula 20/6, 2013 at 6:21

@BenVoigt please bear with me, but I still do not understand why in the class case B and A do not have an alias conflict. Apparently you are saying that alias analysis does not check class types for matching, but only the members that are accessed. But if that is the case, why do the bullets list classes? It says "a type that is a (possibly cv-qualified) base class type of the dynamic type of the object" - this clearly concerns class types in an access. Or are you saying that a a.b does not constitute accessing the value of an object by an lvalue of type decltype(a)? – Mindszenty 20/6, 2013 at 13:22

In other words, are you saying that this would be an aliasing violation: *b = B() (assuming my function's body of the question and that b points actually to an A). Then b->x++ is valid (because the final lvalue is of type int, and the object expression has no bearing on aliasing analysis), but *b = B() is not (because the final lvalue is of type B)? I am sorry, but I do not understand what you are saying at all, yet. – Mindszenty 20/6, 2013 at 13:24

@JohannesSchaub-litb the bullet list is one of the greatest mystery of C++. – Impersonal 6/7, 2013 at 0:5

The strict aliasing rule does not have an exception for layout-compatible classes, so it's not clear to me how you think this answers the question. Would suggest editing some more explanation into the answer. – Oppressive 11/2, 2017 at 2:25

@Oppressive in addition, Swifts answer shows that even layout compatible structs are "alias optimized" if they are not in an inheritance relationship. So this answer appears to be incorrect (?) – Mindszenty 11/2, 2017 at 16:24

Undefined behavior includes case that it does work, even if it shouldn't.

According to standard usage of this union allows to access type and size fields of either header or data members:

union Packet {
   struct Header {
   short type;
   short size;  
   } header;
   struct Data {
   short type;
   short size;  
   unsigned char data[MAX_DATA_SIZE];
   } data;
}

This is strictly limited to unions, but many compilers support that as kind of extension, provided the "incomplete" type would be ended with array of undefined size. If you delete extra static non-member from child class it indeed becomes trivial and layout compatible, which allows aliasing?

struct A {
  int a;
};

struct B  {
  int a;
  //int x;
};

A a;

int g(B *b) {
   a.a = 10;
   b->a++;
   return a.a;
}

Still performs aliasing optimization, though. In your case with same amount of non-static members the most derived class is assumed to be same as the base class. Let's invert the order:

#include <vector>
#include <iostream>

struct A {
  int a;
};

struct B : A  {
  int x;
};

B a;

int g(A *b) {
   a.a = 10;
   b->a++;
   return a.a;
}

int main()
{
    std::cout << g((A*)&a);
}

This returns 11 as expected, as a B is clearly also an A, unlike the original attempt. Let's play further

struct A {
  int a;
};

struct B : A {
    int  foo() { return a;}
};

Would not cause aliasing optimization, unless foo() is virtual. Adding non-static or const member to B would result in "10" answer, adding non-trivial constructor or a static doesn't.

PS. In second example

enum A : int { X, Y };
enum B : int { Z };

Layout compatibleness between those two here is defined by C++14, and they aren't compatible with underlying type (but convertible). although something like

 enum A a = Y;
 enum B b = (B*)a;

may produce undefined behavior, just like if you try to map float with arbitrary 32bit value.

Epicycloid answered 7/2, 2017 at 7:0 Comment(3)

This confuses me: "Still performs aliasing optimization, though. ". So when A and B are unrelated classes that are layout compatible (actually, structucally equivalent!), then it does the optimization. But if everything is equal, but there's just a base-derived relationship like in my testcase, then it does not do the optimization. This indicates, that it's not a mere thing of "layout compatibility" or structural equivalence, but something to do with inheritance, right? Or am I misunderstanding the answer? – Mindszenty 11/2, 2017 at 16:19

Rewarding to this, since it was the most helpful answer so far. – Mindszenty 11/2, 2017 at 17:55

@Johannes Schaub - litb Ironically, my students, when challenged with this code, pointed at the (B*)&A and said "That's wrong", basing their understanding exactly on relation of the base and derived class (and they know about underlying address mechanism) If standard doesn't regulate it (or we didn't found that), it still kind of makes sense from the point of strict aliasing rule if we interpret "different type" in polymorphism manner? "A is not B" if they are unrelated, "B is also A" but not "equal" and "pointer to A might be pointer to B" when A is base class of B, – Epicycloid 12/2, 2017 at 9:32

I think your code is UB, as you are dereferencing a pointer that comes from a casting that violates the type aliasing rules.

Now, if you activate the strict aliasing flag, you are allowing the compiler to optimize the code for UB. How to use this UB is up to the compiler. You can see the answers for this question.

Regarding gcc, the documentation for -fstrict-aliasing shows it can optimize based on:

(...) an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same.

I have not been able to find a definition for "almost the same", but two classes with exactly the same layout may be considered "almost the same" and they are left out of the optimization.

Shimmery answered 11/2, 2017 at 1:18 Comment(2)

The function "g" in isolation does not have undefined behavior, so this cannot explain the difference in optimizer outputs of the two compilers. – Mindszenty 11/2, 2017 at 17:11

When you activate the -fstrict-alisiang flag, you are telling the compiler that it can optimize assuming that you are not doing anything that leads to UB due to aliasing. You can only get to b == &a through UB. So, the optimizer notices that b != &a and g() will always return 10. So I though that the real question was why the optimizer is not always using this optimization, depending on the B declaration. I could not find anything on the standard but I found out that the documentation talks about the non-standard concept of "almost the same types". – Shimmery 11/2, 2017 at 23:53

I believe the following is legal C++ (without invoking UB):

#include <new>

struct A {
  int a;
};

struct B : A {
  // int x;
};

static A a;

int g(B *b);
int g(B *b) {
   a.a = 10;
   b->a++;
   return a.a;
}

int f();
int f() {
  auto p = new (&a) B{};
  return g(p);
}

because (the global) a always refers to an object of type A (even though it's subobject of a B-object after a call to f()) and p points to an object of type B.

If you mark a to have static storage duration (as I did above), all compilers I've tested will happily apply strict aliasing and optimize to return 10.

On the other hand, if you mark g() with __attribute__((noinline)) or add a function h() that returns a pointer to a

A* h();
A* h() { return &a; }

the compilers I've tested assume that &a and the parameter b can alias and reload the value.

Seattle answered 3/5, 2017 at 16:21 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags