Union of structs with common first member
Asked Answered
D

3

7

I am unsure of whether or not the code has pointer aliasing (or other standard conformance issues) in the asserts cast. It seems that a pointer to the union type should be able to be cast to a pointer of the first member and since the union is only composed of these two structs, I think a cast to the first member should work, but I'm not sure if this is correct or if I'm glossing over padding details in the process. Are unions required to pad the upper bits?

It seems as this is unspecified behavior? Does anyone have any insight as to whether this is suported. I know that there is an alternative standard way of doing this by using a struct with a enum type field and struct container_storage member, but it seems like a waste of space considering that this information is already in struct contained

compilation command in linux: gcc -std=c99 -Wextra -pedantic -fstrict-aliasing test.c && ./a.out && echo $? returns 0

#include <stdlib.h>
#include <assert.h>

enum type {type_a = 1, type_b = 2};

struct contained {
    int some_other_field;
    enum type type;
};

struct container_a {
    struct contained contained;
    int test;
};


struct container_b {
    struct contained contained;
    char test;
};

union container_storage {
    struct container_a container_a;
    struct container_b container_b;
};

int
main(int argc, char **argv)
{
    union container_storage a =
        {.container_a = {.contained = {.type = type_a}, .test = 42}};
    union container_storage b =
        {.container_b = {.contained = {.type = type_b}, .test = 'b'}};

    assert(((struct contained *)&a)->type == type_a);
    assert(((struct contained *)&b)->type == type_b);

    return EXIT_SUCCESS;
}

References:

[1] gcc, strict-aliasing, and casting through a union

[2] What is the strict aliasing rule?

Docile answered 23/12, 2013 at 22:30 Comment(0)
B
5

That should be fine. C11, 6.5.2.3/6 ("Structure and union members") says:

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

(C++ makes the same guarantee (C++11, 9.2/18) for standard-layout unions.)

Boehike answered 23/12, 2013 at 22:39 Comment(3)
That's great to hear. I was trying to dig through the C99 spec, but I couldn't seem to find this anywhere. Thanks!Docile
Sorry, the conclusion is correct (it is well-defined to do the cast the question asks about) but I'm downvoting because it seems to me like this is wrongly pointing at the wrong part of the standard. The most relevant part s are worded something like: "A pointer to a structure object, suitably cast, points to its initial member ([pedantry about bit fields]), and vice versa" and "A pointer to a union object, suitably cast, points to each of its members ([pedantry about bit fields]), and vice versa."Alarick
The question is asking about the cast in the assert - if the cast to struct contained * is well-defined, then the ->type is well-defined, but if the cast is not well-defined, then the common initial sequence rule doesn't help. The common initial sequence rule makes both ->container_a.contained.type and ->container_b.contained.type valid accesses on both a and b (but notably, this guarantee does not prevent f.e. the union having leading padding - the separate guarantee about casting their pointers is what does that).Alarick
D
2

union don't pad, they just overlay their members. The first member of any struct is guaranteed to start right off, without padding. In general struct that start with the same members of same type are guaranteed to have the same layout for that initial part.

Dimitri answered 23/12, 2013 at 22:39 Comment(0)
T
1

Under C89, a pointer of structure type which identifies a member of a union may be used to inspect any member which is part of a Common Initial Sequence shared with the type of data stored therein. This in turn generally implies that a pointer to any structure type could be used to inspect any member of the Common Initial Sequence shared with any other type (such behavior would have been unambiguously defined if the object happened to be a member of a declared union object, and the only practical way for a compiler to yield the required behavior in those cases would be to uphold it for all).

C99 added an additional requirement that the CIS guarantees only apply when a complete union type containing both structures is visible, which some compiler writers seem to think means it only applies to accesses performed directly through union types. The authors of such compilers seem to think a function that would need to handle functions with a common header like:

struct smallThing { void *next; uint16_t length; uint8_t dat[2]; };
struct bigThing { void *next; uint16_t length; uint8_t dat[65528]; };

should be to extract out the header like:

struct uHeader {  void *next; uint16_t length; };
struct smallThing { uHeader head; uint8_t dat[2]; };
struct bigThing { uHeader head; uint8_t dat[15994]; };

or use union-type objects for everything, even though using uHeader would increase the size of struct smallThing by 50% (and totally break any code that had been reliant upon its layout), and using unions for everything when most objects only need to be small would increase memory usage a thousandfold.

If one needs code to be compatible with compilers that essentially ignore the Common Initial Sequence rule, one should regard the Common Initial Sequence rule as essentially useless. Personally, I think it would be better to document that only compilers that honor the CIS should be considered suitable for use with one's code, rather than bending over backward to accommodate unsuitable compilers, but I think it's important to be aware that compilers like the latter ones exist.

So far as I can tell, clang and gcc do not honor the CIS rule in any useful way except when the -fno-strict-aliasing flag is set. I don't know about other compilers.

Traumatism answered 3/5, 2017 at 20:18 Comment(12)
This is a great answer. I think compiling a list of non-conforming compilers in it would be a good idea.Docile
@backscattered: I don't know how many compilers have followed gcc and clang in deciding that the only way to force adherence to the Common Initial Sequence guarantee should be to disable all type-based optimization, but IMHO the purpose of the guarantee was clear and the costs of upholding it should be far less than the costs of making pessimistic presumptions on all pointer accesses.Traumatism
@Traumatism have you checked if casting through a union pointer and back forces GCC and Clang to recognize that aliasing could happen? Well, first I'd try just adding a union smolBigg { struct smallThing smol; struct bigThing bigg; } in every translation unit that the structs are defined in, and then see if that alone makes them recognize aliasing could happen - but then if that fails, perhaps casting the struct pointer to that union and then back might work? ((struct smallThing *)(union smolBigg * )foo)->next, and maybe also cast to the union before any conversions to void *.Alarick
@mtraceur: Even using an lvalue expression of the form (someUnion.oneArrayOfStructs+index)->cisMember is insufficient to make them accommodate the possibility that code might be accessing someUnion.otherArrayOfStructs[index].cisMember. The only way the language in the Standard would make any sense would be if it was meant as a compromise to allow for programmers relying upon the CIS rule to add union-type declarations to indicate such reliance, but for whatever reason the authors of clang and gcc interpret such compromises in bad faaith.Traumatism
@Traumatism sorry are we talking about the same thing? Two different arrays within a union don't fall under either of the relevant rules I have in mind (except perhaps the pointer cast being allowed might apply if index is the literal zero), because at that point it's not a union of structs, it's a union of arrays of structs which the standard is not generous enough to mention (not even in the special case where we know that both arrays have same-sized items).Alarick
@mtraceur: The Standard specifies that the syntax structOrUnion.arrayMember[index] is syntactic sugar for *(structOrUnion.arrayMember+index), and also that the addresses of both someUnion.oneArrayOfStructs[0] and someUnion.otherArrayOfStruct[0] coincide with that of someUnion, implying they match each other.Traumatism
@mtraceur: My point with the example is that the whole purpose of the Strict Aliasing Rule is to avoid requiring that compilers refrain from making optimizations that would usually be correct, and would only be incorrect [the Rationale uses that term] in type punning scenarios where there is no evidence of any relationship between objects. Further, I can't think of any definition of "complete union type" and "visible" that would not regard a complete union type definition as being visible in the context where the above lvalues could be evaluated.Traumatism
@Traumatism You're still using an example that doesn't fit the letter of the common initial sequence rule, not even in C89 before the "visible" requirement... the common initial sequence rule carves out an exception that only mentions struct members of unions - it does not cover members who are arrays of structs; it does not reach through arrays to also cover common initial sequences of different structs whose arrays happen to be in the same union.Alarick
@Traumatism However, I suspect that if you defined union great_success { typeof(someUnion.oneArrayOfStructs[0]) a; typeof(someUnion.otherArrayofStructs[0]) b; } (or replace unportable typeof with the actual type text), then any of the following four should be recognized by GCC and Clang as legal aliasings ((union great_success *)someUnion.oneArrayOfStructs)->a.cisMember, ((union great_success *)someUnion.oneArrayOfStructs)->b.cisMember, ((union great_success *)someUnion.otherArrayOfStructs)->a.cisMember, and ((union great_success *)someUnion.otherArrayOfStructs)->b.cisMember.Alarick
@mtraceur: In all documented versions of C prior to C89, if p1 and p2 were pointer expressions of types struct s1 and struct s2, and they had the same address, CIS guarantees would apply to accesses performed via p1 and p2. Nothing in the C89 nor even C99 Rationale implies any intention not to support such constructs. Further, C99 TC3 added a note specifying that union accesses are performed in a manner consistent with the representation of the types involved.Traumatism
@Traumatism yes I know that was normal to rely on and the C standard took that away. But it seems like we're talking past each other. I proposed a very specific workaround (and initially asked if you had ever tried it). Anyway, I've now had time to do a little of my own testing in GCC and Clang, and my union smolBigg suggestion needs to be fixed to either ((union smolBigg * )foo)->smol.next or ((union smolBigg * )foo)->bigg.next, but after that fix it does work, it language-lawyers GCC and Clang into respecting possible aliasing.Alarick
@mtraceur: Even under the most outrageously stretched interpretation of the C Standard, it doesn't "take anything away" from what can be done by programs that are conforming but not strictly conforming. All it does is say, in essence, "If your customers would allow you to process this construct incorrectly, the Standard won't get in the way". The notion that "non-portable or erroneous" should be viewed as synonymous with "broken" directly contradicts the documented intention of the C89 and C99 authors.Traumatism

© 2022 - 2024 — McMap. All rights reserved.