Strict-aliasing and pointer to union fields
Asked Answered
F

2

3

I've got a question about strict-aliasing rules, unions and standard. Assume we have the following code:

#include <stdio.h>

union
{
    int f1;
    short f2;
} u = {0x1};

int     * a = &u.f1;
short   * b = &u.f2;

int main()
{
    u.f1 = 1;
    *a += 1;
    u.f2 = 2;
    *b *= 2;

    printf( "%d %hd\n", *a, *b);

    return 0;
}

Now let's look how it works:

$ gcc-5.1.0-x86_64 t.c -O3 -Wall && ./a.out 
2 4
$ gcc-5.1.0-x86_64 t.c -O3 -Wall -fno-strict-aliasing && ./a.out 
4 4

We can see that strict-aliasing breaks dependencies. Moreover it seems to be a correct code without breaking strict-aliasing rule.

  1. Does it turn out than in case of union fields an object laying at the address is compatible with all types of union members?
  2. If 1 is true what should compiler do with pointers to union members? Is it a problem in the standard, that allows such compiler behavior? If not - why?
  3. Generally speaking different behavior of the compiler with the correct code is inadmissible in any case. So it seems to be a compiler bug too (especially if taking address to union field will be inside functions, the SA does not breaks dependence).
Fellowman answered 12/8, 2015 at 14:48 Comment(2)
No, it is not correct code. Strict aliasing is a requirement of the standard, not the compiler. It is a courtesy of gcc to disable this for broken code.Larondalarosa
@Olaf: Some implementations define the behavior of the above code, and some don't. Code that is limited to compilers that support behaviors beyond what the Standard requires (i.e. most useful programs) might not be completely portable, but that doesn't mean they are in any sense "broken". By contrast, gcc's behavior in many cases (though admittedly not the above) would be downright broken unless it either (1) applies the "One Program Rule" in a way that would render the Standard meaningless, or (2) documents that it conformance requires -fno-strict-aliasing or -O0.Bryophyte
L
2

The C standard says that aliasing via unions is explicitly permitted.

However check the following code:

void func(int *a, short *b)
{
     *a = 1; 
     printf("%f\n", *b);
}

The intent of the strict aliasing rule is that a and b should be assumed to not alias. However you could call func(&u.f1, &u.f2); .

To resolve this dilemma, a common sense solution is to say that the 'bypass permit' that unions have to avoid the strict aliasing rule only applies to when the union members are accessed by name.

The Standard doesn't explicitly state this. It could be argued that "If the member used..." (6.5.2.3) actually is specifying that the 'bypass' only occurs when accessing the member by name, but it's not 100% clear.

However it is hard to come up with any alternative and self-consistent interpretation. One possible alternative interpretation goes along the lines that writing func(&u.f1, &u.f2) causes UB because overlapping objects were passed to a function that 'knows' it does not receive overlapping objects -- sort of like a restrict violation.

If we apply this first interpretation to your example, we would say that the *a in your printf causes UB because the current object stored at that location is a short, and 6.5.2.3 doesn't kick in because we are not using the union member by name.

I'd guess based on your posted results that gcc is using the same interpretation.

This has been discussed before here but I can't find the thread right now.

Lightproof answered 12/8, 2015 at 14:59 Comment(4)
I think a major key to understanding the rule is to recognize that the authors of the Standard perceived zero need to define a language suitable for systems-programming purposes, since anyone seeking to make an implementation be suitable for such purpose could specify that the aliasing rules don't apply (even though, oddly, I think I've seen more compilers whose authors pretend the rule doesn't exist than explicitly renounce its use for optimization). Too bad the Committee never defined a "type aliasing barrier" since that would have made the language suitable for systems programming...Bryophyte
...and eliminated the need for the "character-type" exception to the aliasing rules.Bryophyte
@Bryophyte well, compilers have implemented such a thing (-fno-strict-aliasing)Lightproof
Is there any standard way by which a program can indicate that it needs to be able to ensure that all writes to a block of memory using any type before a certain point need to precede all writes of that memory using any type after that point? Many programs that require no-strict-aliasing would in fact work just fine with 99% aliasing optimizations enabled if there were a way that a programmer could block the 1% that would cause trouble.Bryophyte
W
2

The C99 Technical Corrigendum 3 is clarifying about the type-punning based on the union method by stating in the section 6.5.2.3:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning").

See here from 1042 through 1044

Walsh answered 12/8, 2015 at 15:0 Comment(8)
That provision is of rather limited utility given that the only types of union members that can be accessed without UB are char, signed char, and unsigned char.Bryophyte
@Bryophyte could you please elaborate?Wolpert
@JerryJeremiah: According to N1570 6.5p7, an object of an aggregate type may only be accessed using an lvalue of its own type, a character type, or an enclosing aggregate type. There is no rule that allows an object of aggregate type to be accessed using an lvalue of member type. The authors of the Standard likely figured that compiler writers would recognize that given something simple like int *p = unionPtr->intMember; *p = 1; a quality compiler would recognize the association between *p and the union whether the Standard would require it or not, but...Bryophyte
...gcc and clang are more "clever" and will generate more "efficient" code that can't reliably handle the possibility that the union might also be accessed as some other type.Bryophyte
@JerryJeremiah: While compiler writers seem to recognize most expressions that use of the form unionPtr->intMember directly, they cannot reliably handle a sequence like unionPtr->member = expression1; memberType *p = &unionPtr->member; *p = expression2; memberType result = unionPtr->member;, though I'm not sure how the & operator can meaningfully be said to yield a pointer of the member type if a compiler can't reliably handle a straightforward usage case like that.Bryophyte
@JerryJeremiah: Our resident "I-know-everything-better" is misleading once more. He forgot to read the footnote which clarifies this: port70.net/~nsz/c/c11/n1570.html#note95 As I wrote, taype-punning by a union is the only strictly legal way. Note that this still leave the interpretationm of the data as imnplementation defined and on some implementations it can result in UB. But this has to be specified by the implementation e.g. by stating integers can have a trap representation, so it can be verified in advanced and guaranteed to work for a specific implementation.Larondalarosa
@JerryJeremiah: And, yes, this part of the standard is mandatory for every C compiler. I'd be very surprised gcc or clang fail at this particularily. If they do, though, it would be a bug and should be reported (if not done already). Wouldn't be the first time the optimiser runs amok. The example in the second comment does not even use a union. It's a common missconception thinking the pointer to a union-member is a pointer to a union. At this point 6.5p6 ff indeed are in effect. But that's a completely different subject.Larondalarosa
@toohonestforthissite: The way the N1570 p6.5p7 is written, only a tiny minority of programs that use structs or unions actually have defined behavior; ability to run them is a Quality of Implementation issue. Implementations that make a bona fide effort not to be obtuse will support most such programs, however, including many which obtuse compilers' optimizers will break.Bryophyte

© 2022 - 2024 — McMap. All rights reserved.