Type punning a struct in C and C++ via a union
Asked Answered
P

4

11

I've compiled this in gcc and g++ with pedantic and I don't get a warning in either one:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct a {
    struct a *next;
    int i;
};

struct b {
    struct b *next;
    int i;
};

struct c {
    int x, x2, x3;
    union {
        struct a a;
        struct b b;
    } u;
};

void foo(struct b *bar) {
    bar->next->i = 9;
    return;
}

int main(int argc, char *argv[]) {
    struct c c;
    memset(&c, 0, sizeof c);
    c.u.a.next = (struct a *)calloc(1, sizeof(struct a));
    foo(&c.u.b);
    printf("%d\n", c.u.a.next->i);
    return 0;
}

Is this legal to do in C and C++? I've read about the type-punning but I don't understand. Is foo(&c.u.b) any different from foo((struct b *)&c.u.a)? Wouldn't they be exactly the same? This exception for structs in a union (from C89 in 3.3.2.3) says:

If a union contains several structures that share a common initial sequence, and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them. Two structures share a common initial sequence if corresponding members have compatible types for a sequence of one or more initial members.

In the union the first member of struct a is struct a *next, and the first member of struct b is struct b *next. As you can see a pointer to struct a *next is written, and then in foo a pointer to struct b *next is read. Are they compatible types? They're both pointers to a struct and pointers to any struct should be the same size, so they should be compatible and the layout should be the same right? Is it ok to read i from one struct and write to the other? Am I committing any type of aliasing or type-punning violation?

Parlin answered 14/2, 2015 at 22:52 Comment(1)
I suggest changing one of the tags to "language-lawyer" so that the experts in this topic are more likely to see the question. Section 6.7.6.1 of the C11 draft says, "For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types." but that still leaves the question of whether struct a is compatible with struct b.Slovak
V
5

In C:

struct a and struct b are not compatible types. Even in

typedef struct s1 { int x; } t1, *tp1;
typedef struct s2 { int x; } t2, *tp2;

s1 and s2 are not compatible types. (See example in 6.7.8/p5.) An easy way to identify non-compatible structs is that if two struct types are compatible, then something of one type can be assigned to something of the other type. If you would expect the compiler to complain when you try to do that, then they are not compatible types.

Therefore, struct a * and struct b * are also not compatible types, and so struct a and struct b do not share a common initial subsequence. Your union-punning is instead governed by the same rule for union punning in other cases (6.5.2.3 footnote 95):

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.


In C++, struct a and struct b also do not share a common initial subsequence. [class.mem]/p18 (quoting N4140):

Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.

[basic.types]/p9:

If two types T1 and T2 are the same type, then T1 and T2 are layout-compatible types. [ Note: Layout-compatible enumerations are described in 7.2. Layout-compatible standard-layout structs and standard-layout unions are described in 9.2. —end note ]

struct a * and struct b * are neither structs nor unions nor enumerations; therefore they are only layout-compatible if they are the same type, which they are not.

It is true that ([basic.compound]/p3)

Pointers to cv-qualified and cv-unqualified versions (3.9.3) of layout-compatible types shall have the same value representation and alignment requirements (3.11).

But that does not mean those pointer types are layout-compatible types, as that term is defined in the standard.

Vetavetch answered 15/2, 2015 at 2:34 Comment(2)
So if I have two structs like in the example how am I supposed to access from both of them safely?Parlin
@test Type-punning via union is technically UB in C++, though many common implementations have documented support for it. And in most systems the two structs will have the same layout, and your code will work. If you want full standard compliance, then you need to remember which member you wrote into last time, and only read from that member (i.e., use a tagged union).Vetavetch
J
2

What you could do (and i've been bitten by this before), is declare both struct's initial pointer to be void* and do casting. Since void is convertible to/from any pointer type, you would only be forced to pay an ugliness tax, and not risk gcc reordering your operations (which I've seen happen -- even if you use a union), as a result of compiler bugs in some versions. As @T.C. correctly points out, layout compatibility of a given type means that at the language level they are convertible; even if types might incidentally have the same size they are not necessarily layout compatible; which might give some greedy compilers to assume some other things based on that.

Joel answered 15/2, 2015 at 17:4 Comment(1)
And you could make it less ugly using macros (which is what I do)Heal
O
2

I've had a similar question some time ago, and I think I can answer yours.

Yes, struct a and struct b are not compatible types, and pointers to them are also incompatible.

Yes, what you are doing is illegal even from the outdated point of view of the C89 standard. However, it may be interesting to note that if you reverse the order of elements in struct a and struct b, you would be able to access int i of a struct c instance (but not access its next pointer in any way, i.e. bar->i = 9; instead of bar->next->i = 9;), but only from the C89 standard's point of view.

But even if you'll reverse the order of elements in the two structs, what you're doing would still be illegal from the point of view of the C99 and C11 standards (as interpreted by the commitee). In C99, the part of the standard you have quoted has been changed to this:

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible.

The last phrase is a bit ambiguous, since you can interpret "visible" in several ways, but, according to the commitee, this means that the inspection should be performed on an object of the union type in question.

So, in your case, the correct way to handle this would be something along the lines of:

struct a {
    int i;
    struct a *next;
};

struct b {
    int i;
    struct b *next;
};

union un {
    struct a a;
    struct b b;
};

struct c {
    int x, x2, x3;
    union un u;
};

/* ... */

void foo(union un *bar) {
    bar.b->next->i = 9; /* This is the "inspection" operation */
    return;
}

/* ... */

foo(&c.u);

That is all fine and interesting from the language-lawyer point of view, but in reality, if you don't apply different packing settings to them, structs with the same initial sequence will have it with the same layout (in 99.9% of cases). Actually, they should have the same layout even in your original setup, since the pointers to struct a and struct b should have the same size. So, if your compiler doesn't get nasty when you break strict aliasing, you can more-or-less safely typecast between them, or use them in a union the way you're using them now.

EDIT: as noted by @underscore_d in the comments to this answer, since the appropriate clauses in the C++ standards do not have the line "anywhere that a declaration of the completed type of the union is visible" in their appropriate parts, it may be possible that the C++ standard has the same stance on the subject as the C89 standard.

Oblivion answered 15/2, 2015 at 17:37 Comment(4)
It should be noted that the clause stipulating "a declaration of the completed type of the union is visible" is only present in the C standard - not in either of the C++11 or C++14 drafts I have here. (Thankfully, given how ambiguous it seems to me.)Glochidiate
@underscore_d: if this is true, you are right and I should add this to my answer, although in my opinion, this entire approach makes little to no sense in C++. By the way (I'm asking out of pure academic interest), could you also clarify if this means that the code I provided in this related question would not break strict aliasing in C++, like it would not break it in C89?Oblivion
It is true; check the drafts if you must :-) I'd counter that the approach makes a lot of sense in C++ for my current project... haha. I'm using a strange combination of high-level objects, pointer-based relationships, and binary buffers. As for the link, very interesting, but I'm afraid I don't know enough to comment - I'm currently wading through similar questions, and this difference in wording in C++ is just the latest stumbling block! I'd be very interested if anyone else can answer for you.Glochidiate
Thanks for emphasising that C89 has the same position as C++ on this - perhaps very relevant. I hadn't really registered that until you said it. I'll add that to my question about this difference in C++ - https://mcmap.net/q/246365/-union-39-punning-39-structs-w-quot-common-initial-sequence-quot-why-does-c-99-but-not-c-stipulate-a-39-visible-declaration-of-the-union-type-39/2757035Glochidiate
G
-1

Yes, this is fine; the bolded part of the quote in your question covers this case.

Grease answered 14/2, 2015 at 23:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.