Strict aliasing and overlay inheritance

Asked 20/2, 2017 at 19:20 Answered 2/3, 2017 at 0:18

Solved c struct language-lawyer strict-aliasing

Consider this code example:

#include <stdio.h>

typedef struct A A;

struct A {
   int x;
   int y;
};

typedef struct B B;

struct B {
   int x;
   int y;
   int z;
};

int main()
{
    B b = {1,2,3};
    A *ap = (A*)&b;

    *ap = (A){100,200};      //a clear http://port70.net/~nsz/c/c11/n1570.html#6.5p7 violation

    ap->x = 10;  ap->y = 20; //lvalues of types int and int at the right addrresses, ergo correct ?

    printf("%d %d %d\n", b.x, b.y, b.z);
}

I used to think that something like casting B* to A* and using A* to manipulate the B* object was a strict aliasing violation. But then I realized the standard really only requires that:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 1) a type compatible with the effective type of the object, (...)

and expressions such as ap->x do have the correct type and address, and the type of ap shouldn't really matter there (or does it?). This would, in my mind, imply that this type of overlay inheritance is correct as long as the substructure isn't manipulated as a whole.

Is this interpretation flawed or ostensibly at odds with what the authors of the standard intended?

Oaxaca answered 20/2, 2017 at 19:20 Comment(0)

The line with *ap = is a strict aliasing violation: an object of type B is written using an lvalue expression of type A.

Supposing that line was not present, and we moved onto ap->x = 10; ap->y = 20;. In this case an lvalue of type int is used to write objects of type int.

There is disagreement about whether this is a strict aliasing violation or not. I think that the letter of the Standard says that it is not, but others (including gcc and clang developers) consider ap->x as implying that *ap was accessed. Most agree that the standard's definition of strict aliasing is too vague and needs improvement.

Sample code using your struct definitions:

void f(A* ap, B* bp)
{
  ap->x = 213;
  ++bp->x;
  ap->x = 213;
  ++bp->x;
}

int main()
{
   B b = { 0 };
   f( (A *)&b, &b );
   printf("%d\n", b.x);
}

For me this outputs 214 at -O2, and 2 at -O3 , with gcc. The generated assembly on godbolt for gcc 6.3 was:

f:
    movl    (%rsi), %eax
    movl    $213, (%rdi)
    addl    $2, %eax
    movl    %eax, (%rsi)
    ret

which shows that the compiler has rearranged the function to:

int temp = bp->x + 2;
ap->x = 213;
bp->x = temp;

and therefore the compiler must be considering that ap->x may not alias bp->x.

Ate answered 20/2, 2017 at 20:52 Comment(7)

Excellent answer. Thanks. – Oaxaca 20/2, 2017 at 21:2

Good answer. I think that the letter of the Standard says that it is not, but others (including gcc and clang developers) consider ap->x as implying that *ap was accessed. I can 'prove' to you that the intention of the Standard is not what you consider it to be, because of this: When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values. 51) Thus the access ap->x, clearly isn't the same as bp->x. – Headsman 21/2, 2017 at 2:4

@Headsman comments aren't for extended discussion - I don't want to have a debate about interpretation of strict aliasing here. Arguments that interpret ambiguities in the standard by purporting what the intention must have been, have never been settled before in past comment debates and this one would be no different. Such debate would be better suited to a discussion forum, such as usenet or reddit. Perhaps you could post an answer laying out your "proof" in detail , then other readers can form their own conclusion. – Ate 21/2, 2017 at 3:11

It's interesting that using the structs only to get pointers to their members and then using those pointers for access suppresses this behavior godbolt.org/g/rJT58z – Oaxaca 22/2, 2017 at 16:0

@PSkocik I guess that shows gcc doesn't propagate its TBAA across pointer assignment. Not in this case anyway. I'm sure the actual TBAA code in the compiler is somewhat complicated, because they don't want to break certain patterns which violate the rule but are in common usage (e.g. receiving a char buffer via network and then aliasing it as integers). – Ate 22/2, 2017 at 20:36

@PSkocik: If the address is taken and stored into a pointer object, and the pointer is then used in a later operation, gcc may not break things yet, but gcc will break code which would take the address of a struct member and then apply the indirection operator to the resulting pointer. GCC should be viewed as two compilers--a conforming but not terribly efficient one which is enabled via -fno-strict-aliasing, and a non-conforming one which may be useful for high-end number crunching and maybe some other purposes but is unsuitable for any purpose which would require the ability to... – Carilyn 2/3, 2017 at 0:25

...use storage as multiple types, including system or embedded programming. – Carilyn 2/3, 2017 at 0:25

When C89 was written, it would have been impractical for a compiler to uphold the Common Initial Sequence guarantees for unions without also upholding them for struct pointers. By contrast, specifying the CIS guarantees for struct pointers would not imply that unions would exhibit similar behavior if their address was not taken. Given that the CIS guarantees have been applicable to struct pointers since January 1974--even before the union keyword was added to the language--and a lot of code had for years relied upon such behavior in circumstances which could not plausibly involve objects of union type, and that the authors of the C89 were more interested in making the Standard concise than in making it "language-lawyer-proof", I would suggest that C89's specification of CIS rule in terms of unions rather than struct pointers was almost certainly motivated by a desire to avoid redundancy, rather than a desire to allow compilers the freedom to go out of their way to violate 15+ years of precedent in applying CIS guarantees to structure pointers.

The authors of C99 recognized that in some cases applying the CIS rule to structure pointers might impair what would otherwise be useful optimization, and specified that if a pointer of one structure type is used to inspect a CIS of member of another, the CIS guarantee won't hold unless a definition of a complete union type containing both structures is in scope. Thus, for your example to be compatible with C99, it would need to contain a definition of a union type containing both of your structures. This rule appears to have been motivated by a desire to allow compilers to limit application of the CIS to cases where they would have reason to expect that two types might be used in related fashion, and to allow code to indicate that types are related without having to add a new language construct for that purpose.

The authors of gcc seem to think that because it would be unusual for a code to receive a pointer to a member of a union and then want to access another member of a union, the mere visibility of a complete union type definition should not be sufficient to force a compiler to uphold CIS guarantees, even though most uses of the CIS had always revolved around structure pointers rather than unions. Consequently, the authors of gcc refuse to support constructs like yours even in cases where the C99 Standard would require it.

Carilyn answered 2/3, 2017 at 0:18 Comment(0)

Recommended topics

Hot tags