Is it allowed to use unions for type punning, and if not, why?
Asked Answered
S

5

116

I've been searching for a while, but can't find a clear answer.

Lots of people say that using unions to type-pun is undefined and bad practice. Why is this? I can't see any reason why it would do anything undefined considering the memory you write the original information to isn't going to just change of its own accord (unless it goes out of scope on the stack, but that's not a union issue, that would be bad design).

People quote the strict aliasing rule, but that seems to me to be like saying you can't do it because you can't do it.

Also what is the point of a union if not to type pun? I saw somewhere that they are supposed to be used to use the same memory location for different information at different times, but why not just delete the info before using it again?

To summarise:

  1. Why is it bad to use unions for type punning?
  2. What it the point of them if not this?

Extra information: I'm using mainly C++, but would like to know about that and C. Specifically I'm using unions to convert between floats and the raw hex to send via CAN bus.

Synonym answered 4/9, 2014 at 11:56 Comment(4)
For one very common use of unions, think of the lexical analyzer in a compiler. It can return a token-value pair to the parser, and depending on the token the value can, for example, be either an integer number, a floating point number, a character, or a pointer to a string. How would you best represent these different value types in a single structure? A union of course.Mathian
I cover the details of whether type punning via a union is legal in both C and C++ in my answer to Why does optimisation kill this function?. Basically, always legal in C not clear if it is legal in C++ but in practice most compilers support it in C++.Copious
I meant to add an answer to this a while ago but forgot and then I came upon this question again looking into something else, well I just added my answer.Copious
See also: Portability of using union for conversion.Scudder
D
79

To re-iterate, type-punning through unions is perfectly fine in C (but not in C++). In contrast, using pointer casts to do so violates C99 strict aliasing and is problematic because different types may have different alignment requirements and you could raise a SIGBUS if you do it wrong. With unions, this is never a problem.

The relevant quotes from the C standards are:

C89 section 3.3.2.3 §5:

if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined

C11 section 6.5.2.3 §3:

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member

with the following footnote 95:

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

This should be perfectly clear.


James is confused because C11 section 6.7.2.1 §16 reads

The value of at most one of the members can be stored in a union object at any time.

This seems contradictory, but it is not: In contrast to C++, in C, there is no concept of active member and it's perfectly fine to access the single stored value through an expression of an incompatible type.

See also C11 annex J.1 §1:

The values of bytes that correspond to union members other than the one last stored into [are unspecified].

In C99, this used to read

The value of a union member other than the last one stored into [is unspecified]

This was incorrect. As the annex isn't normative, it did not rate its own TC and had to wait until the next standard revision to get fixed.


GNU extensions to standard C++ (and to C90) do explicitly allow type-punning with unions. Other compilers that don't support GNU extensions may also support union type-punning, but it's not part of the base language standard.

Drennan answered 4/9, 2014 at 18:40 Comment(15)
I don't have my copy of C90 at hand to verify the context; I do remember from committee discussions that one of the intents was that the wording should allow "debugging" implementations, which trapped if the access were other than the last element written. (This was, of course, in the late 1980's; the attitude of the C committee may have evolved since then.) I seem to recall that this was achieved by undefined behavior, but implementation-defined would do the trick as well. (The main difference here is that the implementation would be required to document what it does.)Leucite
The footnote is non-normative, and in context, clearly an explication as to why the committee didn't define this. It does not define behavior.Leucite
@JamesKanze: The value is that of the named member. That is the normative part, clarified by the footnote. If all bytes that make up the object representation of that member take specified value and do not correspond to a trap representation, the member will take specified value as well. It does not matter how these bytes got there (by memcpy, through modification via char *, through a different union member, ...). You will not be able to convince me otherwise, so unless you change your mind, it's probably pointless to continue...Drennan
I do remember from committee discussions that one of the intents was that the wording should allow "debugging" implementations, which trapped if the access were other than the last element written. That might have been the case in the 80s; when C99 disallowed type-punning through pointer casts, the need for a different mechanism arose; this is it; sadly, there appears to be no mention of it in the C99 rationale, but it is plausible that this is what happenedDrennan
You will not be able to convince me otherwise... - without a very convincing new argument; I let my frustration show through there, sorry about thatDrennan
I too have the impression that the attitude of the C committee has evolved; since 1990, I've been involved in the standardization of C++, and haven't followed C that closely. The basic rule still applies, however: anything which the standard doesn't define is undefined behavior. And this clearly falls into this category. I think (but cannot prove) that the intent is that all type-punning be undefined behavior, to be defined by the implementation.Leucite
Let us continue this discussion in chat.Drennan
@Christoph, how about 6.7.2.1p14, which says that “the value of at most one of the members can be stored in a union object at any time”. How then can you access "the value of the named member" when the value is not stored? The footnote makes the apparent intention clear, but isn't normative. The normative text strongly implies that the named member has no value. This text remains in C11; I personally feel like it needs rewording, within the normative text.Foreshorten
If this is correct, how can we explain that this example's func contains a single floating-point load? Under strict aliasing, either taking the address of a union field has to be illegal, or either type punning in unions cannot be well-defined.Federation
@zneak: I would argue that because you're not accessing the object through "an aggregate or union type that includes one of the aforementioned types among its members", you're violating effective typing rules; change the function signature to func(foo* u1, foo* u2), and it 'should' work (my version of clang still micompiles it on -O2, whereas gcc compiles it as expected even on -O3); though unrelated, also note that you never initialized the union, so the compiler is free to do whatever anyway ;)Drennan
I don’t think that any of your standard quotes include that language, and I don’t have the standard in front of me, so it’s a bit hard to evaluate. Are you effectively saying that I can’t take the address of union fields, or that there is language that guarantees behaviour which is only applicable when the compiler statically knows that the object is a union?Federation
@zneak: The quote is from section 6.5 §6/7, which lay out the effective typing rules. In particular, they forbid dereferencing pointers of 'wrong' type, making type-based alias analysis possible. Conceptionally, memory locations are tagged with their effective type, which is either the declared type or (in case of dynamic allocation) the type of the object last stored. Now, this is conjecture on my part: (continued)Drennan
In case of unions, the memory can't be tagged with the type of all members, which would contradict example 3 of section 6.5.2.3 (§9). However, this would imply that you can't dereference a direct pointer to a union member, which seems nonsensical. So my conjecture is that storing into a union member sets an effective type, similar to storing into dynamic memory. The union tag of course still remains, so you can access such locations either through arbitrary union members (making the type punning mentioned in footnote 95 possible) as well as through pointers to the member last stored.Drennan
@Christoph: I would suggest that a better and simpler model would be to say that an actual or hoisted access via pointer or lvalue which is freshly derived from another is an access to the parent, which may be hoisted to the point of derivation. A pointer/lvalue should remain fresh for purposes of accessing a piece of storage until the next unrelated access (hoisted or actual) to the same storage, or until the next time execution enters a function or loop iteration wherein such access would occur. Such a rule would largely eliminate the need for the character-type exception...Mesic
...as well as the need to regard storage as having a dynamic type if none of its contents have any non-trivial constructors or destructors. Thus, T1 *p1 = &u.m1; *p1 = something1; T2 *p2 = &u.m2; *p2 = something2; would be allowed, since each pointer is used when it's fresh. T1 *p1 = &u.m1; T2 *p2 = &u.m2; *p1 = something1; *p2 = something2; would be forbidden, since the write to *p2, hoisted to the evaluation of &m2, would cause *p1 to go stale before it's used.Mesic
C
25

Unions original purpose was to save space when you want to be able to be able to represent different types, what we call a variant type see Boost.Variant as a good example of this.

The other common use is type punning the validity of this is debated but practically most compiler support it, we can see that gcc documents its support:

The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected.

note it says even with -fstrict-aliasing, type-punning is allowed which indicates there is an aliasing issue at play.

Pascal Cuoq has argued that defect report 283 clarified this was allowed in C. Defect report 283 added the following footnote as clarification:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

in C11 that would be footnote 95.

Although in the std-discussion mail group topic Type Punning via a Union the argument is made this is underspecified, which seems reasonable since DR 283 did not add new normative wording, just a footnote:

This is, in my opinion, an underspecified semantic quagmire in C. Consensus has not been reached between implementors and the C committee as to exactly which cases have defined behavior and which do not[...]

In C++ it is unclear whether is defined behavior or not.

This discussion also covers at least one reason why allowing type punning through a union is undesirable:

[...]the C standard's rules break the type-based alias analysis optimizations which current implementations perform.

it breaks some optimizations. The second argument against this is that using memcpy should generate identical code and is does not break optimizations and well defined behavior, for example this:

std::int64_t n;
std::memcpy(&n, &d, sizeof d);

instead of this:

union u1
{
  std::int64_t n;
  double d ;
} ;

u1 u ;
u.d = d ;

and we can see using godbolt this does generate identical code and the argument is made if your compiler does not generate identical code it should be considered a bug:

If this is true for your implementation, I suggest you file a bug on it. Breaking real optimizations (anything based on type-based alias analysis) in order to work around performance issues with some particular compiler seems like a bad idea to me.

The blog post Type Punning, Strict Aliasing, and Optimization also comes to a similar conclusion.

The undefined behavior mailing list discussion: Type punning to avoid copying covers a lot of the same ground and we can see how grey the territory can be.

Copious answered 26/6, 2015 at 19:37 Comment(8)
The claim that memcpy produces identical code ignores the fact that more efficient code could be generated if compilers would document patterns where aliasing would be recognized, in which case a compiler would only have to make a small number of mildly-pessimistic (but likely accurate) presumptions, whereas memcpy would often force a compiler to make more pessimistic presumptions. The code for memcpy itself may look fine, but its effects on the code around it not so much.Mesic
It's worth noting that with C++17 we're getting std::variant as a variant typePrefect
It might also be good to mention that std::memcpy is only valid if the types are TriviallyCopyablePrefect
@Mesic if you could provide a godbolt example that shows this effect, it would be very helpful. As I understand Richard's position this should not be the case, perhaps it is a bug then.Copious
@ShafikYaghmour: Given the code uint16_t *outptr; void store_double_halfword(uint32_t dat) { uint32_t *dp = (uint32_t*)outptr; outptr = dp+1; memcpy(dp, &dat, sizeof (uint32_t)); } void store_loop1(uint32_t *src){ for (int i=0; i<100; i++) store_next_word1(src[i]); }, there would be no way for a compiler to avoid having to reload and re-store outptr on every pass through the loop when code uses memcpy. If the compiler could be relied upon to treat the cast from uint16_t* as a sign that the function could might access things of type uint16_t or uint32_t, thus allowing code...Mesic
...to safely use an assignment to *dp instead of using memcpy,, such reloads and re-stores could be eliminated.Mesic
@ShafikYaghmour: Slight correction [I oopsed when trying to change an identifier name to be more meaningful]: replace store_next_word with store_double_halfword. Add the obvious #include directives and the code will paste into gcc.godbolt.com just fine.Mesic
For C++ we may get bit_cast with implementation hereCopious
L
8

There are (or at least were, back in C90) two modivations for making this undefined behavior. The first was that a compiler would be allowed to generate extra code which tracked what was in the union, and generated a signal when you accessed the wrong member. In practice, I don't think any one ever did (maybe CenterLine?). The other was the optimization possibilities this opened up, and these are used. I have used compilers which would defer a write until the last possible moment, on the grounds that it might not be necessary (because the variable goes out of scope, or there is a subsequent write of a different value). Logically, one would expect that this optimization would be turned off when the union was visible, but it wasn't in the earliest versions of Microsoft C.

The issues of type punning are complex. The C committee (back in the late 1980's) more or less took the position that you should use casts (in C++, reinterpret_cast) for this, and not unions, although both techniques were widespread at the time. Since then, some compilers (g++, for example) have taken the opposite point of view, supporting the use of unions, but not the use of casts. And in practice, neither work if it is not immediately obvious that there is type-punning. This might be the motivation behind g++'s point of view. If you access a union member, it is immediately obvious that there might be type-punning. But of course, given something like:

int f(const int* pi, double* pd)
{
    int results = *pi;
    *pd = 3.14159;
    return results;
}

called with:

union U { int i; double d; };
U u;
u.i = 1;
std::cout << f( &u.i, &u.d );

is perfectly legal according to the strict rules of the standard, but fails with g++ (and probably many other compilers); when compiling f, the compiler assumes that pi and pd can't alias, and reorders the write to *pd and the read from *pi. (I believe that it was never the intent that this be guaranteed. But the current wording of the standard does guarantee it.)

EDIT:

Since other answers have argued that the behavior is in fact defined (largely based on quoting a non-normative note, taken out of context):

The correct answer here is that of pablo1977: the standard makes no attempt to define the behavior when type punning is involved. The probable reason for this is that there is no portable behavior that it could define. This does not prevent a specific implementation from defining it; although I don't remember any specific discussions of the issue, I'm pretty sure that the intent was that implementations define something (and most, if not all, do).

With regards to using a union for type-punning: when the C committee was developing C90 (in the late 1980's), there was a clear intent to allow debugging implementations which did additional checking (such as using fat pointers for bounds checking). From discussions at the time, it was clear that the intent was that a debugging implementation might cache information concerning the last value initialized in a union, and trap if you tried to access anything else. This is clearly stated in §6.7.2.1/16: "The value of at most one of the members can be stored in a union object at any time." Accessing a value that isn't there is undefined behavior; it can be assimilated to accessing an uninitialized variable. (There were some discussions at the time as to whether accessing a different member with the same type was legal or not. I don't know what the final resolution was, however; after around 1990, I moved on to C++.)

With regards to the quote from C89, saying the behavior is implementation-defined: finding it in section 3 (Terms, Definitions and Symbols) seems very strange. I'll have to look it up in my copy of C90 at home; the fact that it has been removed in later versions of the standards suggests that its presence was considered an error by the committee.

The use of unions which the standard supports is as a means to simulate derivation. You can define:

struct NodeBase
{
    enum NodeType type;
};

struct InnerNode
{
    enum NodeType type;
    NodeBase* left;
    NodeBase* right;
};

struct ConstantNode
{
    enum NodeType type;
    double value;
};
//  ...

union Node
{
    struct NodeBase base;
    struct InnerNode inner;
    struct ConstantNode constant;
    //  ...
};

and legally access base.type, even though the Node was initialized through inner. (The fact that §6.5.2.3/6 starts with "One special guarantee is made..." and goes on to explicitly allow this is a very strong indication that all other cases are meant to be undefined behavior. And of course, there is the statement that "Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior" in §4/2; in order to argue that the behavior is not undefined, you have to show where it is defined in the standard.)

Finally, with regards to type-punning: all (or at least all that I've used) implementations do support it in some way. My impression at the time was that the intent was that pointer casting be the way an implementation supported it; in the C++ standard, there is even (non-normative) text to suggest that the results of a reinterpret_cast be "unsurprising" to someone familiar with the underlying architecture. In practice, however, most implementations support the use of union for type-punning, provided the access is through a union member. Most implementations (but not g++) also support pointer casts, provided the pointer cast is clearly visible to the compiler (for some unspecified definition of pointer cast). And the "standardization" of the underlying hardware means that things like:

int
getExponent( double d )
{
    return ((*(uint64_t*)(&d) >> 52) & 0x7FF) + 1023;
}

are actually fairly portable. (It won't work on mainframes, of course.) What doesn't work are things like my first example, where the aliasing is invisible to the compiler. (I'm pretty sure that this is a defect in the standard. I seem to recall even having seen a DR concerning it.)

Leucite answered 4/9, 2014 at 13:15 Comment(9)
it was implementation-defined, not undefined in C90 - making this illegal is a C++-ismDrennan
in fact, the C committee made it illegal to use pointer casts for type punning by introducing effective typing, so using unions is the C way to do itDrennan
@Drennan It's still undefined behavior in C11, at least in the copy I have. §6.7.2.1/16 is quite clear about this. C++ is even clearer, because it has the concept of an object lifetime separate from the storage duration, but even in C, accessing an uninitialized object (other than as a sequence of bytes) is undefined behavior, and assigning to one element of a union makes all others "uninitialized".Leucite
I'm sorry, but you are mistaken as far as C is concerned; I wrote an answer especially for you, listing the relevant quotesDrennan
@Drennan The problem is that your argument largely depends on a non-normativie not, taken out of context. The significant text is in §6.7.2.1/16. And C does have the concept of an invalid object, which results in undefined behavior when it is accessed.Leucite
@Christoph: The notion of PODS having a lifetime separate from their storage duration is far more of a "billion dollar mistake" than Hoare's null pointers. Hard to describe, and hard for both compilers and programmers to deal with. What the language should do is require that if a byte of a PODS is changed during a particular execution of a loop or function, all accesses made during that execution must either be related in one of several locally-discernible ways [not just type], or there must be local evidence of something weird going on (e.g. a volatile-qualified access).Mesic
@Christoph: A key point that many people seem to miss is that aliasing requires certain patterns of pointer/reference creation and use, and that many of the patterns programmers squawk about should not be viewed as involving aliasing. Two pointers or references should not be described as "aliasing" during a particular execution of a function or loop if one of them goes completely unused during that execution. Further, a pointer or lvalue derived from another should not be said to alias the parent unless unless the parent is used to access or address a byte between the time...Mesic
...the time the child is created and the last time the child or anything derived from it will access that byte. If all pointers that are ever going to be used to access some storage are going to derive from some particular pointer p, and the storage isn't "owned" by any restrict-qualified pointers that weren't used in the derivation of p, no action done with p or its derivatives can possibly alias anything that existed previously, so the types of any pointers that were used previously should be irrelevant. Making the rules only apply to PODS in cases that actually involve aliasing...Mesic
...would make things massively easier for programmers and compiler writers alike.Mesic
N
6

It's legal in C99:

From the standard: 6.5.2.3 Structure and union members

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

Nellynelms answered 4/9, 2014 at 12:1 Comment(18)
@JamesKanze Could you expand on how “the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation” is a fancy way of saying it's undefined behavior? It seems to me it says that what's read is a reinterpretation in the new type, and that this is a fancy way to say that it is implementation-defined behavior, if anything.Pet
@PascalCuoq Anything that may result in a trap is undefined behavior.Leucite
@JamesKanze I take “This might be a trap representation” to mean that if the new type has trap representations then, under implementation-defined conditions, the result of the type-punning may be one of them.Pet
@JamesKanze: type-punning via unions is well-defined as long as it does not result in a trap representation (and the source-type is not of smaller size than the target type); this is a case-by-case decision depending on the types and values involved; there's a footnote in C99 that makes it very clear that type-punning is legal; the (non-normative!) annex listed it incorrectly as unspecified (not undefined) behaviour; the annex has been fixed with C11Drennan
@PascalCuoq it could have been implementation-defined; the original C committee decided otherwise. (Implementation defined means that the behavior must be documented.)Leucite
@Drennan That's C, and only since C99. The original C90 clearly and intentionally made it undefined behavior, for whatever reasons, and C++ still does (unless there were significant changes in this in C++11).Leucite
@JamesKanze: yes, this only applies to C; however, using unions this way has never been undefined behaviour; see C89 draft, section 3.3.2.3: if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-definedDrennan
Finally: the quoted text is part of a non-normative note; it should be interpreted as one possible justification. The relevant text is in §6.7.2.1/16, which clearly states that at most one element of a union can be valid at a time. So this answer is simply wrong.Leucite
IMO although a footnote cannot be considered as normative, in this case it seems to be a clarification about some previous norm. It's only "explaining" something, so it counts as (complementary) normative. I interpret that "type punning" is perfectly legal in C99/C11, in the sense that access to a member of type B, after a member of type A was modified, is allowed, and if the compiler represents a value of type A in a way that has sense as a value of type B, then the last value is obtained without any problem. Thus, I also think that type punning is implementation defined only.Drucilla
@JamesKanze As I read "The value of at most one of the members can be stored in a union object at any time." (§6.7.2.1/16 of C11), I understand that if you have union { T1 a; T2 b; } x;, write x.a, write x.b, then the second write will overwrite the old x.a value (completely). But this doesn't forbid the use of x.a to reinterpret x.b with the T1 representation, as said in Note 95 of §6.5.2.3/3, which validates my interpretation without introducing a contradiction.Sosthina
@Sosthina The effect on x.a when writing x.b is not specified. The union no longer holds x.a; that object has ceased to exist. Accessing it is undefined behavior. (In any given implementation, it may be defined, but as far as the standard is concerned, it's undefined.)Leucite
@JamesKanze One obviously can't access an object that no longer exists. But evaluating x.a has the effect to read the (latest) object stored at the corresponding memory location. It is undefined behavior only if the objet representation does not represent a value of the type of x.a.Sosthina
@Sosthina Where in the standard do you get that? The standard is clear; the object x.a no longer exists.Leucite
@JamesKanze Standard identifier (or lvalue) evaluation. Just like evaluating an identifier after a memset/memcpy/memmove to the location of the corresponding object. And I agree, the old x.a object no longer exists, but this doesn't matter here since one is interested in the new data.Sosthina
@Sosthina x.a doesn't exist, period. And accessing an object which doesn't exist is undefined behavior.Leucite
@JamesKanze: If an inactive union field doesn't exist, by what means can one write to it? Also, the common-subseqeuence rule would imply that if a union contains two structures with the same initial elements, storing either to the union would cause those elements to exist in both.Mesic
@Mesic reading a non-active member of an union is in C11 unspecified and in C++ undefined. Writing to a member sets it as the active member and subsequent reads are then well defined for that member. While we might LIKE to pretend that reading a non-active member is fine the standard says otherwise.Tundra
@ABaumstumpf: The C11 draft N1570 never uses the term "active" in relation to unions. Footnote 95 does mention the possibility of writing one union member and reading another, using the term "type punning". Further, if someUnion has member int arr[10];, an assignment like someUnion.arr[1] = 2; is defined as taking the address of someUnion.arr as an int*, adding 1 to it to yield another int*, and then storing 2 to the address identified by the latter int*. There is no described attachment between the latter int* and the union used in its formation.Mesic
D
4

BRIEF ANSWER: Type punning can be safe in a few circumstances. On the other hand, although it seems to be a very well known practice, it seems that standard is not very interested in make it official.

I will talk only about C (not C++).

1. TYPE PUNNING and THE STANDARDS

As folks already pointed but, type punning is allowed in the standard C99 and also C11, in subsection 6.5.2.3. However, I will rewrite facts with my own perception of the issue:

  • The section 6.5 of standard documents C99 and C11 develop the topic of expressions.
  • The subsection 6.5.2 is referred to postfix expressions.
  • The subsubsection 6.5.2.3 talks about structs and unions.
  • The paragraph 6.5.2.3(3) explains the dot operator applied to a struct or union object, and which value will be obtained.
    Just there, the footnote 95 appears. This footnote says:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

The fact that type punning barely appears, and as a footnote, it gives a clue that it's not a relevant issue in C programming.
Actually, the main purpose for using unions is for saving space (in memory). Since several members are sharing the same address, if one knows that each member will be used different parts of the program, never at the same time, then a union can be used instead a struct, for saving memory.

  • The subsection 6.2.6 is mentioned.
  • The subsection 6.2.6 talks about how objects are represented (in memory, say).

2. REPRESENTATION OF TYPES and ITS TROUBLE

If you pay attention to the different aspects of the standard, you can be sure of almost nothing:

  • The representation of pointers is not clearly specified.
  • Worst, pointers having different types could have a different representation (as objects in memory).
  • union members share the same heading address in memory, and it's the same address that of the union object itself.
  • struct members have increasing relative address, by starting in exactly the same memory address that of the struct object itself. However, padding bytes can be added at the end of every member. How many? It's unpredictable. Padding bytes are used mainly for memory allignment purposes.
  • Arithmetical types (integers, floating point real and complex numbers) could be representable in a number of ways. It depends on the implementation.
  • In particular, integer types could have padding bits. This is not true, I believe, for desktop computers. However the standard left the door open for this possibility. Padding bits are used for spetial purposes (parity, signals, who knows), and not for holding mathematical values.
  • signed types can have 3 manners of being represented: 1's complement, 2's complement, just sign-bit.
  • The char types occupy just 1 byte, but 1 byte can have a number of bits different of 8 (but never less than 8).
  • However we can be sure about some details:

    a. The char types have not padding bits.
    b. The unsigned integer types are represented exactly as in binary form.
    c. unsigned char occupies exactly 1 byte, without padding bits, and there is not any trap representation because all the bits are used. Moreover, it represents a value without any ambiguity, following the binary format for integer numbers.

3. TYPE PUNNING vs TYPE REPRESENTATION

All these observations reveals that, if we try to do type punning with union members having types different of unsigned char, we could have a lot of ambiguity. It's not portable code and, in particular, we could have umpredictable behaviour of our program.
However, the standard allows this kind of access.

Even if we are sure about the specific manner in that every type is represented in our implementation, we could have a sequence of bits meaning nothing at all in other types (trap representation). We cannot do anything in this case.

4. THE SAFE CASE: unsigned char

The only safe manner of using type punning is with unsigned char or well unsigned char arrays (because we know that members of array objects are strictly contiguous and there is not any padding bytes when their size is computed with sizeof()).

  union {
     TYPE data;
     unsigned char type_punning[sizeof(TYPE)];
  } xx;  

Since we know that unsigned char is represented in strict binary form, without padding bits, the type punning can be used here to take a look to the binary represention of the member data.
This tool can be used to analyze how values of a given type are represented, in a particular implementation.

I am not able to see another safe and useful application of type punning under the standard specifications.

5. A COMMENT ABOUT CASTS...

If one wants to play with types, it's better to define your own transformation functions, or well just use casts. We can remember this simple example:

  union {
     unsigned char x;  
     double t;
  } uu;

  bool result;

  uu.x = 7;
  (uu.t == 7.0)? result = true: result = false;
  // You can bet that result == false

  uu.t = (double)(uu.x);
  (uu.t == 7.0)? result = true: result = false;
  // result == true
Drucilla answered 4/9, 2014 at 14:37 Comment(6)
I have not seen any citation from the Standard that it makes an exception for type punning via char, and so I am highly sceptical. Do you have one? Note that this might well be defined differently than strict aliasing, which does make an exception for char types. We'd do well not to conflate the two.Racy
@underscore_d: There isn't such an explicit reference to character types in type-punning. I derived myself by gathering facts: I can read in Standard C11 that (1) type-punning is a valid operation in C via union members, (2) although issues can arise from trap representation, (3) but character types have not trap representation, (4) every character type occupies exactly 1 byte. So, arrays of a character type can be used to "read" the bytes of any other object in a union member. However, there is undefined behaviour when accessing members of atomic unions (or structs, also).Drucilla
Y'know, I think I just overlooked the bit where you said you'd talk about C only. Sorry. Apparently when I'm on a mission to research C++, that's all I can see, even when it's not the subject! I like your reasoning for C but will have to assume that in C++, which doesn't allow punning, it's UB to pun via char (but not to alias via pointer). I feel like these should be directly related, but I can't find a C++ source that says 'yep, do whatever you want with char in a union. but I'll stop the OT on your answer now :)Racy
"The fact that type punning barely appears, and as a footnote, it gives a clue that it's not a relevant issue in C programming." As anyone who has ever worked with embedded systems can tell, this is clearly wrong. Type punning is very useful both when declaring register maps and when performing serialization of data. It can also be used as protection against "strict aliasing" bugs. It's an minor editor failure by the C standard not just include this text as a foot note, but that's about all. Type punning is actually about the only valid use of unions...Provoke
"Actually, the main purpose for using unions is for saving space (in memory)" This however, have been debated plenty and the general consensus is that using unions to create "variants" or to store unrelated types are pretty clear indications of muddy program design and should therefore be avoided. Safe coding standards such as MISRA C bans the use of unions for this purpose.Provoke
As for type punning with character types, there is nothing in the standard pointing that out as a special case, except maybe the strict aliasing exceptions. Far more relevant is the part in 6.3.2.3: "When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object." Although this isn't really union type punning, it is guaranteed to work.Provoke

© 2022 - 2024 — McMap. All rights reserved.