Are there any guarantees for unions that contain a wrapped type and the type itself?
Asked Answered
D

3

11

Can I put a T and a wrapped T in an union and inspect them as I like?

union Example {
    T value;
    struct Wrapped { 
       T wrapped;
    } wrapper;
};
// for simplicity T = int

Example ex;
ex.value = 12;
cout << ex.wrapper.wrapped; // ?

The C++11 standards only guarantee save inspection of the common initial sequence, but value isn't a struct. I guess the answer is no, since wrapped types aren't even guaranteed to be memory compatible to their unwrapped counterpart and accessing inactive members is only well-defined on common initial sequences.

Demount answered 2/1, 2018 at 9:37 Comment(0)
O
4

I believe this is undefined behavior.

[class.mem] gives us:

The common initial sequence of two standard-layout struct types is the longest sequence of non-static data members and bit-fields in declaration order, starting with the first such entity in each of the structs, such that corresponding entities have layout-compatible types and either neither entity is a bit-field or both are bit-fields with the same width. [...]

In a standard-layout union with an active member of struct type T1, it is permitted to read a non-static data member m of another union member of struct type T2 provided m is part of the common initial sequence of T1 and T2; the behavior is as if the corresponding member of T1 were nominated.

If T isn't a standard layout struct type, this is clearly undefined behavior. (Note that int is not a standard layout struct type, as it's not a class type at all).

But even for standard layout struct types, what constitutes a "common initial sequence" is based strictly on non-static data members. That is, T and struct { T val; } do not have a common initial sequence - there are no data members in common at all!

Hence, here:

template <typename T>
union Example {
    T value;
    struct Wrapped { 
       T wrapped;
    } wrapper;
};


Example<int> ex;
ex.value = 12;
cout << ex.wrapper.wrapped; // (*)

you're accessing an inactive member of the union. That's undefined.

Orola answered 2/1, 2018 at 13:36 Comment(0)
U
-1

Union behavior is undefined when accessing a member that wasn't the last one written to. So no, you can't depend on this behavior.

It's identical in principle to the idea of having a union to extract specific bytes from an integer; but with additional risk of the fact that you're now depending on the compiler not adding any padding in your struct. See Accessing inactive union member and undefined behavior? for more details.

Unset answered 2/1, 2018 at 10:26 Comment(4)
Your first paragraph doesn't hold for common initial sequences. Also, I've linked the Q&A already in my question.Demount
I think the answer from @Steeve and forcing the alignment with union alignas(sizeof(T)) Example { T value; struct Wrapped { T wrapped; } wrapper }; should workRolfston
@Demount I'm pretty sure that it does hold. Yes, you may have linked the question, but re-read it (note that the answer there has both C11 & C++11) You'll see that it states that value of at most one of the non-static data members can be stored in a union at any time - and the concept of a trap representation has been removed.Unset
@Unset No idea :/. Every answer on this question has at least one downvote (including the deleted one).Demount
A
-1

It should work because both Example and Wrapped are standard layout classes, and C++14 standard has enough requirements to guarantee that in that case value and wrapper.wrapped are located at the same address. Draft n4296 says in 9.2 Class members [class.mem] §20:

If a standard-layout class object has any non-static data members, its address is the same as the address of its first non-static data member.

A note even says:

[ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. —end note ]

That means that you at least respect the strict aliasing rule from 3.10 Lvalues and rvalues [basic.lval] §10

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined
— the dynamic type of the object,
...
— an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),

So this is perfectly defined:

cout << *(&ex.wrapper.wrapped) << endl

because &ex.wrapper.wrapped is required to be the same as &ex.value and the pointed object has the correct type. . But as the standard is explicit only for common subsequence. So my understanding is cout << ex.wrapper.wrapped << endl invokes undefined behaviour, because of a note in 1.3.24 [defns.undefined] about undefined behavior says (emphasize mine):

Undefined behavior may be expected when this International Standard omits any explicit definition of behavior...

TL/DR: I would bet a coin that most if not all common implementation will accept it, but because of the note from 1.3.24 [defns.undefined], I would never use that in production code but would use *(&ex.wrapper.wrapped) instead.


In the more recent draft n4659 for C++17, the relevant notion is inter-convertibility ([basic.compound] §4).

Anhanhalt answered 2/1, 2018 at 11:20 Comment(12)
Uh, I wasn't aware of [defns.undefined]. Thanks for bringing that up. The wording in C+11's 9.2§20 is very similar, but explicitly mentions pointer: "A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa". So you can cite n3337 if you want to include C++11 in your answer.Demount
Dereferencing a pointer that has the right address and type, does not make the object it refers accessible.Violetvioleta
@SergeBallesta Literaly: reinterpret_cast<union_A*>(&union_A.member_a)->member_b supposing that member_b is the active member, member_a does not need to be active.Violetvioleta
@Oliv: That's the reason why I use a pointer. As value has just been assigned, it is the active member of the union. Because value and wrapper.wrapped have same address and same type, &ex.wrapper.wrapped is in fact a pointer to ex.value which is the valid member. That the reason why it can be dereferenced.Anhanhalt
@SergeBallesta, In the standard, it is never said that a pointer with the right value and type is a pointer to object [basic.compound], and there are given many exemple where a pointer with the right address and type are invalid pointer. For exemple int arr[10]{}; *reinterpret_cast<int*>(&arr)=1 is UB evenif the address of the array and the address of its first element are the same.Violetvioleta
@Violetvioleta [basic.compound] §3 contains explicitely: If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.Anhanhalt
@SergeBallesta Which version of the standard? i do not find it in the last one.Violetvioleta
@SergeBallesta Found it, they have just removed this exact sentence in the C++17 standard! So until C++17 your answer is right for sure. If you could make a small edit (and maybe mention it) I could upvote again.Violetvioleta
@Violetvioleta n4296 for C++14. Same sentence in n3337 for C++11. In n4659 for C++17 the notion is the inter-convertibility of pointers in §4 of basic compound. A pointer to an array and a pointer to its first element are not inter-convertible because they have different types.Anhanhalt
I admit the exemple is not perfect. Here a closer one: unsigned char buffer[2*sizeof(int)]; auto p1=new(buffer) int{}; auto p2 = new(p1+1) int{}; *(p1+1)=10 //UB, here *(p1+1) does not point to *p2 even if p1+1 has the right type and right address.Violetvioleta
@SergeBallesta I have asked a question about that, I hope to get a clear answer! #48062846Violetvioleta
You can't get around the lifetime rules that easily. There's a decent argument that simply writing ex.wrapper.wrapped is UB by omission because [expr.ref]/4.2 defines the behavior of the class member access expression only when the first expression (here, ex.wrapper) designates an object, but there's no living Wrapped object. Moreover, if Wrapped has a nontrivial constructor, then [class.cdtor]/1 makes that same expression explicitly UB.Inotropic

© 2022 - 2024 — McMap. All rights reserved.