Background
Discussions on the mostly un-or-implementation-defined nature of type-punning via a union
typically quote the following bits, here via @ecatmur ( https://mcmap.net/q/247013/-is-it-legal-and-well-defined-behavior-to-use-a-union-for-conversion-between-two-structs-with-a-common-initial-sequence-see-example ), on an exemption for standard-layout struct
s having a "common initial sequence" of member types:
C11 (6.5.2.3 Structure and union members; Semantics):
[...] if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
C++03 ([class.mem]/16):
If a POD-union contains two or more POD-structs that share a common initial sequence, and if the POD-union object currently contains one of these POD-structs, it is permitted to inspect the common initial part of any of them. Two POD-structs share a common initial sequence if corresponding members have layout-compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
Other versions of the two standards have similar language; since C++11 the terminology used is standard-layout rather than POD.
Since no reinterpretation is required, this isn't really type-punning, just name substitution applied to union
member accesses. A proposal for C++17 (the infamous P0137R1) makes this explicit using language like 'the access is as if the other struct member was nominated'.
But please note the bold - "anywhere that a declaration of the completed type of the union is visible" - a clause that exists in C11 but nowhere in C++ drafts for 2003, 2011, or 2014 (all nearly identical, but later versions replace "POD" with the new term standard layout). In any case, the 'visible declaration of union
type bit is totally absent in the corresponding section of any C++ standard.
@loop and @Mints97, here - https://mcmap.net/q/247014/-type-punning-a-struct-in-c-and-c-via-a-union - show that this line was also absent in C89, first appearing in C99 and remaining in C since then (though, again, never filtering through to C++).
Standards discussions around this
[snipped - see my answer]
Questions
From this, then, my questions were:
What does this mean? What is classed as a 'visible declaration'? Was this clause intended to narrow down - or expand up - the range of contexts in which such 'punning' has defined behaviour?
Are we to assume that this omission in C++ is very deliberate?
What is the reason for C++ differing from C? Did C++ just 'inherit' this from C89 and then either decide - or worse, forget - to update alongside C99?
If the difference is intentional, then what benefits or drawbacks are there to the 2 different treatments in C vs C++?
What, if any, interesting ramifications does it have at compile- or runtime? For example, @ecatmur, in a comment replying to my pointing this out on his original answer (link as above), speculated as follows.
I'd imagine it permits more aggressive optimization; C can assume that function arguments
S* s
andT* t
do not alias even if they share a common initial sequence as long as nounion { S; T; }
is in view, while C++ can make that assumption only at link time. Might be worth asking a separate question about that difference.
Well, here I am, asking! I'm very interested in any thoughts about this, especially: other relevant parts of the (either) Standard, quotes from committee members or other esteemed commentators, insights from developers who might have noticed a practical difference due to this - assuming any compiler even bothers to enforce C's added clause - and etc. The aim is to generate a useful catalogue of relevant facts about this C clause and its (intentional or not) omission from C++. So, let's go!
S*
andT*
arguments do not alias even when a union is in view. This means that a program that passes the aliasingS*
andT*
union subobjects will behave differently depending on optimization level. Example: coliru.stacked-crooked.com/a/b57c8dd9e2ef3a02 – EthnologyT
is, of course, updated to 42 'in the background' - so the write isn't binned - but the optimiser doesn't reflect that in the return value, as it assumes, given no aliasing, the result must be 5. coliru.stacked-crooked.com/a/04921db9e5f3945a I'd need to test whether this affects me as (A) I'm generally not referring to such unions via pointers and (B) even less am I doing this via functions. There are probably numerous other ways this can bite me if this turns out to be a general behaviour relevant to such unions, though. Will post more findings tomorrow. – Garrickgcc
andg++
alias when the member types are changed tochar
(showing 42 throughout, unlike before), butclang
acts the same as when usingint
s. Which, if any, is more correct? Fwiw, 99.9% of cases in which I'd be wanting to use this pattern, thestruct
s would containunsigned char
only. I know there's an exception forchar
in aliasing but not how/if that's related to this observation. – Garrickunion
visibility in major compilers for C & C++, do you think that indicates it's not directly related to the added quote being discussed? Either way, am I 'safe' if (A) not using pointers to such members, (B) only passing 1 to any function, or (C) anywhere I need to alias,reinterpret_cast
ing to/fromchar
within scope? Also, if you know a good summary of all these nuances, preferably more condensed than the standard - few things I've read have pointed out crucial caveats like you have here. Sorry to keep bombarding you with questions! – Garrick