There are (or at least were, back in C90) two modivations for
making this undefined behavior. The first was that a compiler
would be allowed to generate extra code which tracked what was
in the union, and generated a signal when you accessed the wrong
member. In practice, I don't think any one ever did (maybe
CenterLine?). The other was the optimization possibilities this
opened up, and these are used. I have used compilers which
would defer a write until the last possible moment, on the
grounds that it might not be necessary (because the variable
goes out of scope, or there is a subsequent write of a different
value). Logically, one would expect that this optimization
would be turned off when the union was visible, but it wasn't in
the earliest versions of Microsoft C.
The issues of type punning are complex. The C committee (back
in the late 1980's) more or less took the position that you
should use casts (in C++, reinterpret_cast) for this, and not
unions, although both techniques were widespread at the time.
Since then, some compilers (g++, for example) have taken the
opposite point of view, supporting the use of unions, but not
the use of casts. And in practice, neither work if it is not
immediately obvious that there is type-punning. This might be
the motivation behind g++'s point of view. If you access
a union member, it is immediately obvious that there might be
type-punning. But of course, given something like:
int f(const int* pi, double* pd)
{
int results = *pi;
*pd = 3.14159;
return results;
}
called with:
union U { int i; double d; };
U u;
u.i = 1;
std::cout << f( &u.i, &u.d );
is perfectly legal according to the strict rules of the
standard, but fails with g++ (and probably many other
compilers); when compiling f
, the compiler assumes that pi
and pd
can't alias, and reorders the write to *pd
and the
read from *pi
. (I believe that it was never the intent that
this be guaranteed. But the current wording of the standard
does guarantee it.)
EDIT:
Since other answers have argued that the behavior is in fact
defined (largely based on quoting a non-normative note, taken
out of context):
The correct answer here is that of pablo1977: the standard makes
no attempt to define the behavior when type punning is involved.
The probable reason for this is that there is no portable
behavior that it could define. This does not prevent a specific
implementation from defining it; although I don't remember any
specific discussions of the issue, I'm pretty sure that the
intent was that implementations define something (and most, if
not all, do).
With regards to using a union for type-punning: when the
C committee was developing C90 (in the late 1980's), there was
a clear intent to allow debugging implementations which did
additional checking (such as using fat pointers for bounds
checking). From discussions at the time, it was clear that the
intent was that a debugging implementation might cache
information concerning the last value initialized in a union,
and trap if you tried to access anything else. This is clearly
stated in §6.7.2.1/16: "The value of at most one of the members
can be stored in a union object at any time." Accessing a value
that isn't there is undefined behavior; it can be assimilated to
accessing an uninitialized variable. (There were some
discussions at the time as to whether accessing a different
member with the same type was legal or not. I don't know what
the final resolution was, however; after around 1990, I moved on
to C++.)
With regards to the quote from C89, saying the behavior is
implementation-defined: finding it in section 3 (Terms,
Definitions and Symbols) seems very strange. I'll have to look
it up in my copy of C90 at home; the fact that it has been
removed in later versions of the standards suggests that its
presence was considered an error by the committee.
The use of unions which the standard supports is as a means to
simulate derivation. You can define:
struct NodeBase
{
enum NodeType type;
};
struct InnerNode
{
enum NodeType type;
NodeBase* left;
NodeBase* right;
};
struct ConstantNode
{
enum NodeType type;
double value;
};
// ...
union Node
{
struct NodeBase base;
struct InnerNode inner;
struct ConstantNode constant;
// ...
};
and legally access base.type, even though the Node was
initialized through inner
. (The fact that §6.5.2.3/6 starts
with "One special guarantee is made..." and goes on to
explicitly allow this is a very strong indication that all other
cases are meant to be undefined behavior. And of course, there
is the statement that "Undefined behavior is otherwise indicated
in this International Standard by the words ‘‘undefined
behavior’’ or by the omission of any explicit definition of
behavior" in §4/2; in order to argue that the behavior is not
undefined, you have to show where it is defined in the standard.)
Finally, with regards to type-punning: all (or at least all that
I've used) implementations do support it in some way. My
impression at the time was that the intent was that pointer
casting be the way an implementation supported it; in the C++
standard, there is even (non-normative) text to suggest that the
results of a reinterpret_cast
be "unsurprising" to someone
familiar with the underlying architecture. In practice,
however, most implementations support the use of union for
type-punning, provided the access is through a union member.
Most implementations (but not g++) also support pointer casts,
provided the pointer cast is clearly visible to the compiler
(for some unspecified definition of pointer cast). And the
"standardization" of the underlying hardware means that things
like:
int
getExponent( double d )
{
return ((*(uint64_t*)(&d) >> 52) & 0x7FF) + 1023;
}
are actually fairly portable. (It won't work on mainframes, of
course.) What doesn't work are things like my first example,
where the aliasing is invisible to the compiler. (I'm pretty
sure that this is a defect in the standard. I seem to recall
even having seen a DR concerning it.)