I was under the impression that accessing a union
member other than the last one set is UB, but I can't seem to find a solid reference (other than answers claiming it's UB but without any support from the standard).
So, is it undefined behavior?
I was under the impression that accessing a union
member other than the last one set is UB, but I can't seem to find a solid reference (other than answers claiming it's UB but without any support from the standard).
So, is it undefined behavior?
The confusion is that C explicitly permits type-punning through a union, whereas C++ (c++11) has no such permission.
6.5.2.3 Structure and union members
95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.
The situation with C++:
9.5 Unions [class.union]
In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.
C++ later has language permitting the use of unions containing struct
s with common initial sequences; this doesn't however permit type-punning.
To determine whether union type-punning is allowed in C++, we have to search further. Recall that c99 is a normative reference for C++11 (and C99 has similar language to C11 permitting union type-punning):
3.9 Types [basic.types]
4 - The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values. 42
42) The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.
It gets particularly interesting when we read
3.8 Object lifetime [basic.life]
The lifetime of an object of type T begins when: — storage with the proper alignment and size for type T is obtained, and — if the object has non-trivial initialization, its initialization is complete.
So for a primitive type (which ipso facto has trivial initialization) contained in a union, the lifetime of the object encompasses at least the lifetime of the union itself. This allows us to invoke
3.9.2 Compound types [basic.compound]
If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.
Assuming that the operation we are interested in is type-punning i.e. taking the value of a non-active union member, and given per the above that we have a valid reference to the object referred to by that member, that operation is lvalue-to-rvalue conversion:
4.1 Lvalue-to-rvalue conversion [conv.lval]
A glvalue of a non-function, non-array type
T
can be converted to a prvalue. IfT
is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not an object of typeT
and is not an object of a type derived fromT
, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.
The question then is whether an object that is a non-active union member is initialized by storage to the active union member. As far as I can tell, this is not the case and so although if:
char
array storage and back (3.9:2), orthe access to a union by a non-active member is defined and is defined to follow the object and value representation, access without one of the above interpositions is undefined behaviour. This has implications for the optimisations allowed to be performed on such a program, as the implementation may of course assume that undefined behaviour does not occur.
That is, although we can legitimately form an lvalue to a non-active union member (which is why assigning to a non-active member without construction is ok) it is considered to be uninitialized.
T
is obtained). –
Surfboarding memcpy
implementations (accessing objects using unsigned char
lvalues), it disallowed accesses to *p
after int *p = 0; const int *const *pp = &p;
(even though the implicit conversion from int**
to const int*const*
is valid), it disallowed even accessing c
after struct S s; const S &c = s;
. CWG issue 616. Does the new wording allow it? There's also [basic.lval]. –
Kaplan memcpy
issue (it's now written in terms of narrow character types). –
Surfboarding memcpy
isn't about indeterminate values: using memcpy
to copy an already initialised value doesn't read any indeterminate values. –
Kaplan memcpy
to copy already initialised values can cause reads of indeterminate values. –
Kaplan &
operator means when applied to a union member. I would think the resulting pointer should be usable to access the member at least until the next time the next direct or indirect use of any other member lvalue, but in gcc the pointer isn't usable even that long, which raises a question of what the &
operator is supposed to mean. –
Terminus The C++11 standard says it this way
9.5 Unions
In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.
If only one value is stored, how can you read another? It just isn't there.
The gcc documentation lists this under Implementation defined behavior
- A member of a union object is accessed using a member of a different type (C90 6.3.2.3).
The relevant bytes of the representation of the object are treated as an object of the type used for the access. See Type-punning. This may be a trap representation.
indicating that this is not required by the C standard.
2016-01-05: Through the comments I was linked to C99 Defect Report #283 which adds a similar text as a footnote to the C standard document:
78a) If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.
Not sure if it clarifies much though, considering that a footnote is not normative for the standard.
char hello[4]; int *p1 = (int *)hello; *pi = 10;
is it now undefined behavior to access hello
? –
Wilmot int*
is not compatible with char*
. There is an exception that allows char*
to alias with anything, but not the other way around. –
Jetton g++
makes the same guarantee for C++. –
Tearoom g++
inherits the same rule: gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Implementation.html . So, yes, it certainly is complicated! It's lucky implementations have de facto standards, especially if you're me and coding C++ where - if my understanding of the Standard is correct - I'm depending on g++
applying the C rules. I hate having to rely on implementation-defined behaviour, but at least it's not un defined and won't delete my code... –
Tearoom -fstrict-aliasing
that compares union (safe) vs. pointer-case (unsafe). –
Lassie I think the closest the standard comes to saying it's undefined behavior is where it defines the behavior for a union containing a common initial sequence (C99, §6.5.2.3/5):
One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.
C++11 gives similar requirements/permission at §9.2/19:
If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them. Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.
Though neither states it directly, these both carry a strong implication that "inspecting" (reading) a member is "permitted" only if 1) it is (part of) the member most recently written, or 2) is part of a common initial sequence.
That's not a direct statement that doing otherwise is undefined behavior, but it's the closest of which I'm aware.
union
s being undefined, since I'd been given the impression by a particular blog that this was OK, and built several large structures and projects around it. Now I think I might be OK after all, since my union
s do contain classes having the same types at the front –
Tearoom union
contains e.g. a uint8_t
and a class Something { uint8_t myByte; [...] };
- I would assume this proviso would also apply here, but it's worded very deliberately to only allow for struct
s. Luckily I'm already using those instead of raw primitives :O –
Tearoom Something that is not yet mentioned by available answers is the footnote 37 in the paragraph 21 of the section 6.2.5:
Note that aggregate type does not include union type because an object with union type can only contain one member at a time.
This requirement seem to clearly imply that you must not write in a member and read in another one. In this case it might be undefined behavior by lack of specification.
I well explain this with a example.
assume we have the following union:
union A{
int x;
short y[2];
};
I well assume that sizeof(int)
gives 4, and that sizeof(short)
gives 2.
when you write union A a = {10}
that well create a new var of type A in put in it the value 10.
your memory should look like that: (remember that all of the union members get the same location)
| x | | y[0] | y[1] | ----------------------------------------- a-> |0000 0000|0000 0000|0000 0000|0000 1010| -----------------------------------------
as you could see, the value of a.x is 10, the value of a.y1 is 10, and the value of a.y[0] is 0.
now, what well happen if I do this?
a.y[0] = 37;
our memory will look like this:
| x | | y[0] | y[1] | ----------------------------------------- a-> |0000 0000|0010 0101|0000 0000|0000 1010| -----------------------------------------
this will turn the value of a.x to 2424842 (in decimal).
now, if your union has a float, or double, your memory map well be more of a mess, because of the way you store exact numbers. more info you could get in here.
© 2022 - 2024 — McMap. All rights reserved.