Exception to strict aliasing rule in C from 6.5.2.3 Structure and union members
Asked Answered
B

2

5

Quote from C99 standard:

6.5.2.3

5 One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

There is example for this case:

// The following code is not a valid fragment because
// the union type is not visible within the function f.

struct t1 { int m; };
struct t2 { int m; };

int f(struct t1 *p1, struct t2 *p2)
{
    if (p1->m < 0)
        p2->m = -p2->m;
    return p1->m;
}

int g()
{
    union
    {
        struct t1 s1;
        struct t2 s2;
    } u;

    /* ... */
    return f(&u.s1, &u.s2);
}

I have added few changes:

#include <stdio.h>

struct t1 { int m; };
struct t2 { int m; };

union u
{
    struct t1 s1;
    struct t2 s2;
};

int foo(struct t1 *p1, struct t2 *p2)
{
    if (p1->m)
        p2->m = 2;
    return p1->m;
}

int main(void)
{
    union u u;
    u.s1.m = 1;
    printf("%d\n", foo(&u.s1, &u.s2));
}

As you can see I have moved union declaration outside so it would be visible in foo(). According to the comment from standard, this should have made my code correct but it looks like strict aliasing still breaks this code for clang 3.4 and gcc 4.8.2.

Output with -O0:

2

Output with -O2:

1

for both compilers.

So my question is:

is C really relies on union declaration to decide if some structures are exception to strict aliasing rule? Or both gcc/clang have a bug?

It seems really broken to me, because even if function and union are both declared in the same header, this does not guarantee that the union is visible in translation unit with body of the function.

Bultman answered 17/1, 2014 at 0:44 Comment(6)
output 2 with gcc 4.4.7Oakie
Check my non-answer in another thread, especially the link to the GCC mailing list discussion: https://mcmap.net/q/247360/-are-c-structs-with-the-same-members-types-guaranteed-to-have-the-same-layout-in-memory I'd really like to see this clarified too.Globigerina
The example you gave first: Is that out of the standard ? It does not look like an example for what the standard says ? I think the standard says if you do in "main", "u.s1.m = 5", then the compiler MUST assume that "u.s2.m" has changed; but if you pass pointers to "u.s1" and "u.s2" around than that's a completely different matter.Loyceloyd
The language of the Standard is clear; the existence of two structures within a union declaration should be deemed as giving notice to the compiler that code will overlay the two types, and thus members of the common initial sequence may alias. The authors of gcc don't like what the Standard says, and choose to ignore it, but I can imagine no other reason for Standard to have referred to the complete type being visible if such visibility were not considered adequate notice that aliasing was likely.Solicit
more exploration of this concept and the large uncertainty around it - including its deliberate absence in C++ due to its Committee IMO rightly considering this proviso to be absurd - can be found in this thread: union 'punning' structs w/ “common initial sequence”: Why does C (99+), but not C++, stipulate a 'visible declaration of the union type'?Climatology
This answer may applyCookson
D
3

The most important point is that your change (moving the union up) is not changing the definition of the function foo at all. It is still a function that receives unrelated pointers. In your example the passed pointers are related while elsewhere this might be different. The goal of compiler is to serve the most general case. The body of the function is different after the change and it is not clear why.

The question that you are asking is about how careful optimization is implemented in your particular compiler for certain command line keys. It has nothing to do with the memory layout. In a correct compiler the result should be the same. Compiler should handle the case when 2 different pointers in fact point to the same place in memory.

Diophantus answered 17/1, 2014 at 2:13 Comment(17)
I am more conserned about standard claiming union declaration visibility breaks the codeBultman
Standard speaks about unions and not anything else. Your foo does not have any union... These things are unrelated.Diophantus
@Ivan I understand how difficult it may be to comprehend the reason for the problem. What Kirill means is that even if the union is visible, the compiler cannot guarantee that structure objects p1 and p2 are necessarily a part of the union for all calls to foo. In other words, as far as the compiler can tell from the declaration of foo, the two structure objects passed as formal parameters are completely unrelated. Had you passed the union object itself to the function, it would turn out differently. In my case, the compiler optimizes the function call away completely with your code.Infer
Oh no, it is not difficult at all, I am well aware about strict aliasing in C. It just the wording that made me think that strict aliasing should be disabled if differenct struct can be contained in one union which is defined by existanse of such unions.Bultman
@ChronoKitsune: Nearly all pre-C89 implementations of the language designed by Dennis Ritchie allowed a programmer to define multiple structure types with a common initial sequence, and use a pointer to any of them to inspect members of any other. I would not regard any language which does not provide a means of doing that as being the language Dennis Ritchie designed. Allowing a union declaration to serve as notice to a compiler that a pointer of one type may be used to access members of another may not not a particularly good way of supporting the semantics, but I no of nothing else...Solicit
...in the Standard that would define a means by which that could be accomplished. To be sure, it would also be helpful if the Standard were to explicitly recognize certain patterns the authors of C89 would likely have regarded as too obvious to be worth mentioning, which would allow most common-initial-sequence issues--as well as many others--to be solved more efficiently, but presently gcc ignores those common patterns.Solicit
@Solicit I have absolutely no problem with what you said. GCC ignores them because it's unclear whether access must be with a union object or whether the declaration of the union type containing those structures is sufficient. That is, different interpretations yield different results. N1520 seeks to clarify the problem and introduce a need to clarify the solutions in such cases.Infer
The only distinction between the parameters of f lies in the address passed to the function. Since the address is the same and struct t1 and struct t2 are distinct types, GCC thinks strict aliasing is violated, and there's no way to prove that they're a part of a union object. Without a way to prove both came from a union object, the fact that the union type is visible or not simply doesn't matter. Modifying the code to pass a pointer to the union object, however, results in the expected behavior.Infer
You can force the required behavior by additionally passing a pointer to the union object and manually comparing &up->s1 == p1, conditionally executing the if (p1->m) block (or just do if (&up->s1 == p1 && p1->m) p2->m = 2;) and returning p1->m at the end. This forces GCC to understand that you fully intended to write to up->s1.m using p2->m since the addresses are the same. Of course, that defeats the purpose of not passing the union object to f, right? That means the only way to reliably do this without redundancy is to pass &u by itself to a new function int g(union u *up)Infer
@ChronoKitsune: Given: struct tiny {uint16_t x; uint8_t y,z;} my_small_things[1000]; struct bigger {uint16_t x; uint8_t y,z[5]; uint64_t q[1000];}; my_bigger_thing[2]; what type could be used to access fields x and y of any member of my_small_things or `my_bigger_things? Further, if an in-line function is supposed to accept pointers to structures which will be designed in future (though the design of the common portion is locked) how could that function possibly accept a pointer to a union whose member types don't even exist yet?Solicit
@ChronoKitsune: If a function accepts a foo_header*, saying that it may be passed a pointer to any structure type which appears in a union type whose complete declaration is visible at any point the latter type is used would make the semantics workable. If one translation unit uses type foo_fancy and foo_header and has defines union type containing those, and another uses foo_plain and foo_header, and defines a union type containing those, that should give the compiler all the information it needs even though it wouldn't be practical to have a single union type containing all.Solicit
(setting aside the usual tedious grumbling about "the language designed by Dennis Ritchie",) more exploration of this can be found via union 'punning' structs w/ “common initial sequence”: Why does C (99+), but not C++, stipulate a 'visible declaration of the union type'?Climatology
@underscore_d: The first C manual (around 1974) defined the behavior of "." and "->" operators in terms of the underlying storage. One of the most useful features stemming from that was the ability to have functions which could use pointers to operate on the common parts of many kinds of structures without having to know the exact types of the structures involved. Each version of the C Standard has been advertised as being essentially upwardly-compatible with the one before, and treating the existence of a complete union type as a warning of possible aliasing would...Solicit
...make it possible to make a lot of existing code which relies upon C's traditional support for the Common Initial Sequence rule work on newer compilers by adding appropriate "union" declarations, without requiring the rest of the code to be reworked. I think there would have been better ways of achieving the same objective, but if the authors of the Standard intended to break existing code in ways that could not readily be fixed, that would imply they were lying about upward compatibility. Are they liars, or is gcc misinterpreting them?Solicit
@Solicit There is also a common theme (not an official description of course) that C is "portable assembly". Some parts of C are specified in term of the view of the low level aspect (like "trap representation"). The rest of the language is specified from the angle of the high level abstract machine semantics. Two worlds collide: it's very hard to keep a straight story with too angles.Gastrula
@curiousguy: The authors of the Standard explicitly state in the Rationale that they do not wish to preclude such usage. The Standard also explicitly states that it is common for implementations to process UB in a documented manner characteristic of the environment. Judging from the rationale, they viewed the question of whether a particular implementation behaves in a fashion characteristic of the target environment as a Quality-of-Implementation issue, but failed to make adequately clear that the Standard makes no effort whatsoever to mandate everything necessary...Solicit
...to make something be a quality implementation that is suitable for any particular purpose.Solicit
S
-1

The set of circumstances in which a compiler recognizes that an access to an aggregate member is an access to the aggregate itself is purely a Quality of Implementation issue, and the Standard makes no effort to recognize any cases where use of a non-character lvalue of the form aggregate.member or pointerToAggregate->member would not violate 6.5p7. A compiler which couldn't handle at least some cases as defined would be of such low quality as to be pretty useless, but the Standard makes no effort to forbid conforming-but-useless implementations.

If a common initial sequence member has a character type, then 6.5p7 would define the behavior of accessing it, regardless of whether it is a member of a common initial sequence of a union whose complete declaration is visible. If it doesn't have a character type, then access would only be defined under 6.5p7 if performed through an lvalue of character type or memcpy/memmove, or in cases where the destination has heap duration and the ultimate type used for a read matches the type used for a write.

There are a number of indications a quality compiler should recognize that would suggest a pointer to one structure type might be used to access a CIS member of another. A compiler that is unable to recognize any of the other indications might benefit from treating the existence of a complete union declaration containing both types as such an indication. Doing so might needlessly block some otherwise-useful optimizations, but would still allow more optimizations than disabling type-based aliasing analysis altogether.

Solicit answered 13/6, 2018 at 20:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.