union of structs sharing same first members
Asked Answered
F

1

11

I have been looking into an un-traditional way of achieving struct "polymorphism" in pre-C11 C. Let's say we have 2 structs:

struct s1 {
    int var1;
    char var2;
    long var3;
};

struct s2 {
    int var1;
    char var2;
    long var3;
    char var4;
    int var5;
};

On most compilers, we could safely cast between pointers to the two and then access the common first members if no padding takes place. However, this is not standartized behaviour.

Now, I found the following line in the C standard as far as C89:

One special guarantee is made in order to simplify the use of unions: If a union contains several structures that share a common initial sequence, and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them. Two structures share a common initial sequence if corresponding members have compatible types for a sequence of one or more initial members.

It also states the following:

A pointer to a union object, suitably cast, points to each of its members (or if a member is a bit-field, then to the unit in which it resides), and vice versa.

Now, if I create a union of these two structs:

union s2_polymorphic {
    struct s1 base;
    struct s2 derived;
};

And use it this way:

union s2_polymorphic test_s2_polymorphic, *ptest_s2_polymorphic;
struct s2 *ptest_s2;
struct s1 *ptest_s1;

ptest_s2_polymorphic = &test_s2_polymorphic;

ptest_s2 = (struct s2*)ptest_s2_polymorphic;

ptest_s2->var1 = 1;
ptest_s2->var2 = '2';

ptest_s1 = (struct s1*)ptest_s2;

printf("ptest_s1->var1 = %d\n", ptest_s1->var1);
printf("ptest_s1->var2 = %c\n", ptest_s1->var2);

Which compiles and runs fine and gives, on gcc (GCC) 4.8.3 20140911, the output

ptest_s1->var1 = 1                                                            
ptest_s1->var2 = 2

Will the behaviour be well-defined, according to the quotes from the standard given above?

Fulgurous answered 11/1, 2015 at 16:12 Comment(8)
I may be misunderstanding what you are doing here, but in the example with the union, should not s2 (i.e. derived) now not duplicate the contents of s1? IE shouldn't it only contain the extra elements?Histrionism
"Will the behaviour be well-defined" I'd say: yes. What makes you doubt this?Weeping
@abligh: and I may be misunderstanding your question... What do you mean by "contain only the extra elements"?Fulgurous
@alk: well, mainly it's that nobody seems to use this approach for struct inheritance/polymorphism, preferring the "first-member-of-derived-struct-is-a-instance-of-base-struct" approach, which makes code far from pretty, while this approach seems so much better. I just can't believe that I've discovered something that seems so good and yet so uncommon =)Fulgurous
@Mints97: Ignore my question - insufficient tea.Histrionism
@Mints97 In fact this method is very common. Normally the first member is used to designate the actual type. This is called a "smart union"Oldfangled
@Mints97 think about memory usage, multiple inheritance, and information hiding and you'll see some downsides to this approach too, no?Cryobiology
@wildplasser: it's usually called a "discriminated union" or a "tagged union" rather than a "smart union" (try searching online for those terms).Tweed
F
2

After some research, I think I have a qualified answer for this question.

The citation given was from the C89 standard. C99 and C11 have it rephrased like this:

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible.

The last part can be interpreted, IMHO, in a variety of ways. However, the commitee left it as it is. According to them, it means that the inspection of the "common initial part" of the structures can be done only using an object of the union type that has been declared to contain them. That is very well demonstrated in this question.

What about C89, which seems to allow what I was trying to do? Well, in a C89-conforming compiler, yes, this should work. However, it might not really be needed: I don't know of a single strictly-C89-conforming compiler that supports strict aliasing, so, with them, it is easier to simply cast the structs with the common initial sequence to each other's types and try not to give them different packing settings. The result should be the same.

Fulgurous answered 16/1, 2015 at 14:1 Comment(2)
Given that the completed type of the union would need to be visible in order to access anything through it, why would the authors of the Standard have said anything about the visibility of the completed type of the union if they didn't intend to allow union declarations to "bind" the associated types as a group that should allow mutual inspection of each others' common members? I know the authors of gcc don't like that rule, and there would be better ways of achieving the desired result, but membership in a visible union is the means the authors of the Standard chose to allow such semantics.Transmogrify
If the Standard were to explicitly allow a pointer to any type to be cast to a pointer to a union containing that type, provided that it is only used to access the proper union member or other members with a common initial sequence, and if it specified that operations via two union types that share any members may alias, that might be a good way to offer the proper functionality, but if one union member has a coarser alignment than another, casting a pointer to a loosely-aligned object to a union pointer would invoke UB since it wouldn't satisfy the union's alignment.Transmogrify

© 2022 - 2024 — McMap. All rights reserved.