memcpy/memmove to a union member, does this set the 'active' member?
Asked Answered
P

4

27

Important clarification: some commenters seem to think that I am copying from a union. Look carefully at the memcpy, it copies from the address of a plain old uint32_t, which is not contained within a union. Also, I am copying (via memcpy) to a specific member of a union (u.a16 or &u.x_in_a_union, not directly to the entire union itself (&u)

C++ is quite strict about unions - you should read from a member only if that was the last member that was written to:

9.5 Unions [class.union] [[c++11]] In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

(Of course, the compiler doesn't track which member is active. It's up to the developer to ensure they track this themselves)


Update: This following block of code is the main question, directly reflecting the text in the question title. If this code is OK, I have a follow up regarding other types, but I now realize that this first block of code is interesting itself.

#include <cstdint>
uint32_t x = 0x12345678;
union {
    double whatever;
    uint32_t x_in_a_union; // same type as x
} u;
u.whatever = 3.14;
u.x_in_a_union = x; // surely this is OK, despite involving the inactive member?
std::cout << u.x_in_a_union;
u.whatever = 3.14; // make the double 'active' again
memcpy(&u.x_in_a_union, &x); // same types, so should be OK?
std::cout << u.x_in_a_union; // OK here? What's the active member?

The block of code immediately above this is probably the main issue in the comments and answers. In hindsight, I didn't need to mix types in this question! Basically, is u.a = b the same as memcpy(&u.a,&b, sizeof(b)), assuming the types are identical?


First, a relatively simple memcpy allowing us to read a uint32_t as an array of uint16_t:

#include <cstdint> # to ensure we have standard versions of these two types
uint32_t x = 0x12345678;
uint16_t a16[2];
static_assert(sizeof(x) == sizeof(a16), "");
std:: memcpy(a16, &x, sizeof(x));

The precise behaviour depends on the endianness of your platform, and you must beware of trap representations and so on. But it is generally agreed here (I think? Feedback appreciated!) that, with care to avoid problematic values, the above code can be perfectly standards-complaint in the right context on the right platform.

(If you have a problem with the above code, please comment or edit the question accordingly. I want to be sure we have a non-controversial version of the above before proceeding to the "interesting" code below.)


If, and only if, both blocks of code above are not-UB, then I would like to combine them as follows:

uint32_t x = 0x12345678;
union {
    double whatever;
    uint16_t a16[2];
} u;
u.whatever = 3.14; // sets the 'active' member
static_assert(sizeof(u.a16) == sizeof(x)); //any other checks I should do?
std:: memcpy(u.a16, &x, sizeof(x));

// what is the 'active member' of u now, after the memcpy?
cout << u.a16[0] << ' ' << u.a16[1] << endl; // i.e. is this OK?

Which member of the union, u.whatever or u.a16 , is the 'active member'?


Finally, my own guess is that the reason why we care about this, in practice, is that an optimizing compiler might fail to notice that the memcpy happened and therefore make false assumptions (but allowable assumptions, by the standard) about which member is active and which data types are 'active', therefore leading to mistakes around aliasing. The compiler might reorder the memcpy in strange ways. Is this an appropriate summary of why we care about this?

Phosphatase answered 29/9, 2016 at 7:2 Comment(51)
It's up to you to track what the "active" member is. The compiler doesn't do it for you.Lustig
Your code breaks the language-lawyer rules and is not portable anyway; it will, for example, produce different output on a big-endian or litttle-endian machine,Tremayne
The behaviour is defined by endianness of the machine in fact. I doubt that "undefined behavior" is a proper term here.Tomb
@JonathanPotter, I'll edit the question to emphasize that. I'm not assuming the compiler actively knows anythingPhosphatase
What are the "language-lawyer" "rules", @LoreheadPhosphatase
@Lorehead, I've editted the question to address endianness and related issuesPhosphatase
Lots of naive comments and answers so far. The naive answer, of course is "Of course the union is unchanged: it's the same, innit". This good question is surprisingly deep. I'm not entirely convinced you're even allowed to memcpy a union due to potentially reading uninitialised memory. See #33394069, although that's on the C tag.Gramme
Don't use memcpy in C++, use std::copy instead.Nightlong
@Gramme sample code calls memcpy to alter union member only and uses initialised memoryTomb
@Bathsheba, I think the point Anton is making is that I am using memcpy here to read from initialized memory, so it should be OK. I'll edit the question now to make clear that u32 is still present, and initialized. Thanks for all these comments, it's helping me to clean up the question!Phosphatase
I could try to argue that with your "memcopy( u.a16, ... )" you already access the inactive union member and thereby trgger UBKalisz
@MarianSpanik, on second thoughts, std::copy doesn't seem to help here, because the types of the args don't match. That's why a 'raw' memcpy (or memmove) is required. Does this make sense?Phosphatase
@Andre, that's perhaps the nub of my question. The memcpy does indeed "access the inactive union member". But then again, simply doing u.a16[0] = 0; also "accesses" the inactive union member ; but it's OK. Surely assignment-into (and maybe therefore, memcpy-into?) is an acceptable time to use an inactive union memberPhosphatase
I've trashed my answer as it's at best incomplete and at worse wrong. Something based on is_trivially_copyable might have legs?Gramme
It really feels like this "active member" is a red herring. The "active member" simply means that all others aren't guaranteed to be anything. I don't believe anything in this question has anything to do with the "active member"Digitate
All we are talking about here is the memory layout of the union and what guarantees are present.Digitate
@xaxxon: Agree. And that's what i tried to explain in my post. In comments and posts its suggested to not use memcpy. But I don't find anything wrong in the example since memcpy is done on valid memory(by valid I means properly allocated and same sized). Person doing memcpy should be well aware of the values inside the source from which memory is copied.Greenhorn
The memory layout of a union has essentially no relevance, I think. Look carefully at exactly the address I copy to, and the address I copy from. I copy to u.a16, not &u, and then I want to read directly from u.a16 - hence I don't care where u.a16 is within the unionPhosphatase
@AaronMcDaid: That's what I commented, memcpy is not a problem here. But Memory Layout does has relevance otherwise modifying one member of union won't make other members meaningless. Predicting/or knowing exactly which is the active member is not possible unless some bookkeeping is done.Greenhorn
@Greenhorn Modifying one element of a union isn't intended to "make other members meaningless" per se. The point of the notion of the active member is that that's the only member whose data is defined to be stored (there is a complication for structures that share the same initial members, but we don't need to worry about that here).Delgado
Clang 3.9 address and undefined behavior sanitizers have no problem with the code. Also, not entirely unsurprisingly, whatever is still 3.14.Digitate
I think because the standard is so vague about this behavior there is no "correct" answer. Every compiler probably allows it and the results may be the same but it's not "standard-compliant".Hegarty
Union semantics aren't quite well-defined until P0137R1. With that, it's pretty clear that changing the active member requires placement new or = (in certain cases). memcpy doesn't cut it. On the other hand, it arguably reuses the storage and end the lifetime of the double, in which case you'd have a union without an active member.Sweetandsour
@Sweetandsour you say " it arguably reuses the storage and end the lifetime of the double" - isn't it guaranteed that it will overwrite memory assocated with the double? One of the two members (or both) is the largest and the union is guaranteed to not be larger than it's largest member, so you're either writing to the whole thing (if double <=), or writing to a subset of the double (if the double is >).Digitate
@Sweetandsour I don't see how P0137R1 would ban using memcpy for changing the active union member here. It doesn't say you can't use memcpy for that purpose, and Core Issue 1116 proposed resolution 1 appears to support that ("if T is trivially copyable…"). If you think PO137R1 does ban it, maybe that's something that needs to be addressed by fixing P0137R1?Delgado
@alastair: In case of union the member whose memory requirement is largest compared to other members is the member whose data is defined to be stored. All other members use subset of memory allocated to union object. So when any member's (say X) value is modified then the values of members whose memory layout overlaps with that of X become meaningless.Greenhorn
@alastair I'm going to take a shot: the storage which the object occupies is released, or is reused by an object that is not nested within o ([intro.object]) or released. - the memcpy clearly does this. All fields in a 2-element union must overlap.Digitate
@alastair "a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended". memcpy doesn't create objects ([intro.object]/1 exhaustively enumerates how objects can be created, and memcpy is not one of them), therefore it cannot begin the lifetime of objects.Sweetandsour
@Sweetandsour so since we've established that it ends the lifetime of whatever and doesn't create a new object, what are the implications of that? Is that something the optimizer could start doing weird things to?Digitate
It's also important to remember when you write c++ that you're not telling the computer what to do. You're requesting a set of guarantees and then the compiler/linker then do their best to make those guarantees happen as quickly as possible. That's why undefined behavior matters -- because as soon as a guarantee goes out the window, it opens up a whole new world for the optimizer to mess with things.Digitate
@Digitate I said "arguably" only because memcpy is a pretty messy area. We need something along the lines of N3751, but getting it right is hard. As to the implications under the current standard, per [basic.life], accessing an object outside its lifetime results in undefined behavior.Sweetandsour
@Sweetandsour I doubt that's intentional — on that reading it would also ban initialisation via a function that takes a pointer or reference of the member variable's type.Delgado
I appreciate that you rewrote your question to address my answer, but annoyingly, it got downvoted into oblivion because it answered what I originally thought you were asking. So: if memcpy() from a uint32_t to an array of uint16_t is valid, your final example is valid, whatever is garbage, and a16 is the active member. Since this was tagged language_lawyer, I pointed out that this is not portable code and went into some reasons it might fail. My apologies that several people thought that was unhelpful.Tremayne
Oh, minor editorial suggestion: copying an object to an array of char, the right size or using memcpy_s(), does not involve any UB, if anyone but me cares about that, and seems to illustrate your point equally well?Tremayne
Aha. So even forgetting entirely about unions, using memcpy between uint32_t and uint16_t is UB, plain and simple? On all platforms? I guess the standard says nothing like: "if the platform satisfies properties X,Y, and Z, then memcpy to an int32_t from an array int16_t[2] is defined". And therefore the first code example in the question is undefinedPhosphatase
@AaronMcDaid It’s early in the morning here, but looking it up again, uint32_t to uint16_t appears safe for your purposes: arrays must be laid out contiguously in memory, and both those types must have exact widths with no padding. They are also trivially copyable. Caveats you already acknowledged but said were not relevant to your question: endianness, implementations where those types might not exist. I still strongly recommend you always check for buffer overruns! The declarations might change due to bit rot.Tremayne
@Lorehead, perhaps there are further static asserts that I can use to confirm the various restrictions you mention? I can do a static_assert to confirm the sizes match up. But is there a trait_type to check that a given type has exact width with no padding? I'd like pseudo-portable code that either fails to compile, or compiles and runs as expected. Perhaps std:: is_scalar? Or maybe std::is_trivial?Phosphatase
One other special case: if the previously-active member of the union had a non-trivial destructor, I do not believe it would get called.Tremayne
I believe what you are looking for is std::has_unique_object_representations. This is true if std::is_trivially_copyable is true and if every equivalent object has a unique representation, i.e., no padding. For the source, you might want std::is_pod (Plain Old Data).Tremayne
Okay, wrote a new answer that I think is more what you were looking for.Tremayne
Maybe related.Leptospirosis
After reading the many well-researched comments on my answer, I think the tl;dr answer is: technically no, but there’s a one-line fix. Make the destination member active first with an initialization, assignment or placement new, and the standard says you can copy over it with memcpy(). Then, whether memcpy() activates it or not, it will be active and hold the correct value.Tremayne
I was also curious about what would happen if I memmove from another member of the same union. However, if I did a placement-new first it would overwrite the data I want to copy from. Anyway, thanks @Lorehead and everyone else. I'll keep checking this for a few days, I'm learning a lot about many things!Phosphatase
@JonathanPotter: The way the rules are written, while a compiler isn't required to track the active member of a union, but may do so for the purposes of optimization. Consequently, even if a programmer knows that the binary representation written to a union object would have a useful value if read as a particular type, a programmer must also worry about what the compiler will think is the active type.Bluestone
@Sweetandsour "Union semantics aren't quite well-defined until P0137R1" Do you mean "union semantics were unclear" or "any use of a union was 100% UB before P0137R1"?Sociolinguistics
@Tremayne Throw an asm(""); and voila, all potential objects appear at all places.Sociolinguistics
@Sociolinguistics Throw in an asm and your code is inherently not meant to be portable.Tremayne
@Tremayne Which C or C++ implementation does not support asm("");?Sociolinguistics
@Sociolinguistics Every compiler’s asm extension is different. As soon as you use an asm extension, you’re inherently targeting a single compiler (or implementations that try to be perfectly compatible with it). A compiler that goes out of its way to be compatible with gcc’s inline assembly will also be compatible with other gcc extensions.Tremayne
@Tremayne So, which compiler is incompatible with asm("");?Sociolinguistics
I think your point is flying over my head here. If you mean the literal sequence of tokens asm(""); with no actual asm statement, that’s completely undefined by the Standard and compilers do a lot of different things. If you’re not making a wholly-theoretical point and actually thinking of using an asm statement, anything that works in one compiler for one target will not work in others. Either way, it’s pointless to worry about making code that contains asm portable.Tremayne
D
7

My reading of the standard is that std::memcpy is safe whenever the type is trivially copyable.

From 9 Classes, we can see that unions are class types and so trivially copyable applies to them.

A union is a class defined with the class-key union; it holds only one data member at a time (9.5).

A trivially copyable class is a class that:

  • has no non-trivial copy constructors (12.8),
  • has no non-trivial move constructors (12.8),
  • has no non-trivial copy assignment operators (13.5.3, 12.8),
  • has no non-trivial move assignment operators (13.5.3, 12.8), and
  • has a trivial destructor (12.4).

The exact meaning of trivially copyable is given in 3.9 Types:

For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.

For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes (1.7) making up obj1 are copied into obj2, obj2 shall subsequently hold the same value as obj1.

The standard also gives an explicit example of both.

So, if you were copying the entire union, the answer would be unequivocally yes, the active member will be "copied" along with the data. (This is relevant because it indicates that std::memcpy must be regarded as a valid means of changing the active element of a union, since using it is explicitly allowed for whole union copying.)

Now, you are instead copying into a member of the union. The standard doesn't appear to require any particular method of assigning to a union member (and hence making it active). All it does is specify (9.5) that

[ Note: In general, one must use explicit destructor class and placement new operators to change the active member of a union. — end note]

which it says, of course, because C++11 allows objects of non-trivial type in unions. Note the "in general" on the front, which quite clearly indicates that other methods of changing the active member are permissible in specific cases; we already know this to be the case because assignment is clearly permitted. Certainly there is no prohibition on using std::memcpy, where its use would otherwise be valid.

So my answer is yes, this is safe, and yes, it changes the active member.

Delgado answered 29/9, 2016 at 8:3 Comment(35)
I'm not copying to a union, just to one member of a union. And the place I am copying from is a plain old uint32_t which is unconnected in any way to any union. Hence, I don't quite see the relevance of this part of your answer: "the active member on the resulting object will be the same as the active member on the object you copied from"Phosphatase
@AaronMcDaid Sorry, I slightly misread your question. I've updated my answer (I still think the answer is yes, it's safe and yes, it changes the active member).Delgado
And sorry if I was rude! I just find I'm having to respond to a number of misunderstandings, probably caused by a complex, badly-worded, question! I think I'll rewrite various aspects of this question in a few hours. For example, renaming u32 as x in order that the reader doesn't think that it is inside a union - I think the leading u is confusing people. But I should get back to my day job for a few hours nowPhosphatase
@AaronMcDaid :-) I didn't read it that way. I can see what you were asking and why you found some of the other answers somewhat infuriating. Hopefully my (revised) answer is helpful.Delgado
Would an implementation of union, when fed types of different sizes, that uses the extra space beyond the end for a magic token whenever the smaller size is active, then proceeds to check it at run time causing a fault if it fails, be a legal implementation of union? (Note that all assignment and placement new would also do this task...)Antihelix
A concern I have is that here memcpy(u.a16 the OP takes a pointer to an inactive union member and passes it to memcpy. That could be UB.Antihelix
@Yakk Taking a pointer to an inactive union member is fine, as long as you don't read through it while it's inactive. Writing through it will, of course, change the active union member.Delgado
@Yakk Consider union { unsigned a; double d; } u; u.d = 1.234; read_unsigned(&u.a);. There's no way for the compiler, in general, to tell that the assignment within read_unsigned, which might in general be in a different module, is to a union (or what type the union is). So your "magic token" idea is obviously unworkable (though I don't know if the standard explicitly bans it).Delgado
"Writing through it will, of course, change the active union member". This is also being discussed in the comments on the question itself. Before asking this question, I assumed the above was obviously true. But I'm not so sure now. The worry is that memcpy counts as 'access', which is problematic. Whereas simple = is initialization (not 'access') and therefore OKPhosphatase
@AaronMcDaid The standard actually uses the word “access” for assignment to a union member in [class.union].5. It’s not that important, but the commenter who said otherwise is mistaken. [basic.life] also says, explicitly, that using the address of an object whose lifetime has not begun as a void* is well-defined.Tremayne
@alastair That’s illegal code in C++, though (but legal in C). The real problem is unsigned write_unsigned(unsigned* p) { return (*p = static_variable); }—by the standard, write_unsigned(&u.a); should update a and make it active, but there is still no way to tell that p is a pointer to a union member. So, if this works, memcpy() must also work.Tremayne
Although I guess the runtime could do a bunch of crazy stuff to pass that information around; stick magic bits in a pointer or an unsigned that mean, “This is a union member,” and have every assignment operator check for them. Keep a data structure of all union addresses.Tremayne
@Lorehead 1) the member access expression by itself does not involve an "access" in the sense of [defns.access]. 2) [class.union]/5 only applies "when the left operand of an assignment operator involves a member access expression ([expr.ref]) that nominates a union member". *p is not a member access expression.Sweetandsour
@bogdan I’d like to follow up on what people who say malloc()/memcpy() cannot “create an object” mean in practice. Let’s say a future project has to run on a compiler that aggressively checks program correctness at runtime, because security and correctness trump performance here. It does this by keeping track of every object in the program. I ask, “If I call malloc(), or copy to a buffer of char[], how does it know about that?” The vendor says, “Those aren’t objects.” I ask, “Does it still work? Can I link to system libraries?” I’m told, “Those live in another area of memory.”Tremayne
If I have legacy code that uses malloc()/memcpy() to create stuff-that’s-pointed-to-but-isn’t-an-object, and I want to compile and link that to the project, what if anything breaks and why?Tremayne
@Lorehead I don't know about others, but here's the way I see these things: In practice, you need to know both what the standard says and what your implementation allows. Knowing that a construct is non-standard allows you to make an educated decision about whether to use it or not, and prevents you from making dangerous inferences like "if A works, then surely B must work as well"; if A is non-standard, the implementation may choose to allow it, but at the same time follow the standard strictly regarding B. More to the point, it's unlikely you'll find an implementation that doesn't allow...Brigandage
... you to do T* p = (T*)malloc(...);, where T is a trivial type, and then treat *p as an lvalue designating an object of type T; there's just too much code around doing that. However, memcpying into a non-active union member, or passing a pointer to such a member to a function and expect that assignment to that *p will switch the active member... might yield surprises under some strict settings, maybe now, maybe in the future.Brigandage
@Brigandage A lot of code type-puns with unions, too, doesn't it?Tremayne
@Brigandage And the standard guarantees that passing a pointer to an inactive union member to a function that assigns through it will make that member active, right?Tremayne
@Lorehead To your last question: No, it doesn't, that's undefined behaviour, as explained previously during our discussion. Read the first sentence of [class.union]/5.Brigandage
@Brigandage Ah, p->a16 is a class member access expression and *p is not. You are correct; I missed that.Tremayne
@Lorehead About type punning with unions: Yes, a lot of code does, and you can get away with it if you're careful, but you can just as well get into trouble. Here's an example. Try it with and without optimizations.Brigandage
@Lorehead No, it isn't illegal code. You've misinterpreted my read_unsigned as a function that reads the pointer. I meant it to be a function to write the pointer (that's why I said "the assignment within read_unsigned"). Perhaps I should have been clearer and written something like std::cin >> u.a to make the problem explicit.Delgado
@Brigandage The C++ standards document you are looking at is a draft, and incorporates text from P0137R1. The current standard does not make this UB. The changes in P0137R1 would seem to make std::cin >> u.a UB, which is patently ridiculous and I hope the committee will notice that and do something to remedy it.Delgado
@T.C.: The definition of "access" does not recognize constructs like&foo.bar and &pfoo->bar, but I would suggest that should be considered a defect in both the C and C++ Standards. I would further suggest that the Standards need to recognize the possibility of storage acquiring multiple Effective (C) or Dynamic (C++) Types, any of which may be used to access it. Recognizing those constructs would fix a lot of problem cases.Bluestone
@alastair The currently published standard is recognized to be defective in this area, so I don't see the point in trying to derive conclusions from it. P0137 provides fixes for those defects and has been approved by the standardization committee; ignoring it is akin to burying one's head in the sand. Sure, it's not in the official ISO document, but it's unlikely for major changes to be introduced between now and publication next year. Also, being resolutions for defect reports, the changes introduced by P0137 are applied "retroactively" in compilers to C++11 and 14 modes where possible.Brigandage
@alastair Allowing std::cin >> u.a to set the active member would open a whole can of worms. There are very good reasons for the current restrictions.Brigandage
@Brigandage The point is that you can't derive any conclusions from a draft that changes from day to day, especially if trying to give authoritative answers to a question. It's also clear that there is some controversy in this area, for instance the fact that you think it's OK for std::cin >> u.a to be labelled UB, whereas I think it's crazy — C++ practitioners will certainly not expect that to be UB and any compiler that took advantage of its UB status would rightly be criticised.Delgado
@alastair changes from day to day - that is definitely a wrong impression of how these changes are applied to the working draft; we're not talking about purely editorial changes here. Other than that, I don't think I have anything new to add here; I stand by what I said above. You and I clearly have different views on what certainly and rightly mean. Anyway, if you want a demonstration of what it would mean to allow your >> example, you can create a chat room and link to it from here - I'll be happy to provide some arguments for why that doesn't work.Brigandage
@Brigandage I understand that, but the fact is that anyone with check-in privileges can, subject to the procedures for amending things, make changes that will make an answer here not match what the Github repo currently says. Stack Overflow answers should, IMO, refer to the actual, published standard. Nothing wrong with talking about the future, but it should be clearly marked as such.Delgado
As for the chat, sure, I'll take you up on that. Here's the chat room.Delgado
@bogdan: What problem would there be with saying that a forming reference or pointer to a union-of-PODS member that will later be used, without laundering, to write a byte will render any pre-existing pointers or references to other members of that union unusable for accessing that byte unless/until they are laundered, and forming a reference or pointer to a UOPM that will be used without laundering to read a byte will render any pre-existing pointers or references to other members unusable for writing that byte unless/until they are laundered?Bluestone
@bogdan: If code takes the addresses of two union members and passes them into a function, and no byte that is written using either is accessed by the other, code which would work when the objects are distinct should work just as well if they are used to disjoint parts of the same union and/or parts of the union that aren't modified. Meanwhile, the calling code should work just as well whether the bytes of the union are written using the first address or the second. Only cases where a byte is modified through one pointer and accessed via the other should pose any problem.Bluestone
@bogdan: The Standard's concept of "active member" makes the legitimacy of code to handle an object dependent upon the last type as which an object was written (in some arbitrarily-distant context) before the function was called, and keeps track of any changes made to an object's type until the next time it's written (which may also be in an arbitrarily-distant context). My concept, by contrast, wouldn't require a compiler to care about what happens before a function is called or after it returns.Bluestone
@Bluestone I think it's better to move this to chat, I sense an interesting discussion coming up :-). Here's the room.Brigandage
T
2

At most one member of a union can be active, and it is active during its lifetime.

In the C++14 standard (§ 9.3, or 9.5 in the draft), all non-static union members are allocated as if they were the sole member of a struct, and share the same address. This does not begin the lifetime, but a non-trivial default constructor (which only one union member can have) does. There is a special rule that assigning to a union member activates it, even though you could not normally do this to an object whose lifetime has not begun. If the union is trivial, it and its members have no non-trivial destructors to worry about. Otherwise, you need to worry about when the lifetime of the active member ends. From the standard (§ 3.8.5):

A program may end the lifetime of any object by reusing the storage which the object occupies or by explicitly calling the destructor for an object of a class type with a non-trivial destructor. [... I]f there is no explicit call to the destructor or if a delete-expression is not used to release the storage, the destructor shall not be implicitly called and any program that depends on the side effects produced by the destructor has undefined behavior.

It is safer in general to explicitly call the destructor of the currently-active member, and make another member active with placement new. The standard gives the example:

u.m.~M();
new (&u.n) N;

You can check at compile time whether the first line is necessary with std::is_trivially_destructible. By a strict reading of the standard, you can only begin the lifetime of a union member by initializing the union, assigning to it, or placement new, but once you have, you can safely copy a trivially-copyable object over another using memcpy(). (§ 3.9.3, 3.8.8)

For trivially-copyable types, the value representation is a set of bits in the object representation that determines the value, and the object interpretation of T is a sequence of sizeof(T) unsigned char objects. The memcpy() function copies this object representation. All non-static union members have the same address, and you can use that address as a void* to storage after it has been allocated and before the object’s lifetime begins (§ 3.8.6), so you can pass it to memcpy() when the member is inactive. If the union is a standard-layout union, the address of the union itself is the same as the address of its first non-static member, and therefore all of them. (If not, it is interconvertible with static_cast.)

If a type has_unique_object_representations, it is trivially-copyable, and no two distinct values share the same object representation; that is, no bits are padding.

If a type is_pod (Plain Old Data), then it is trivially-copyable and has standard layout, so its address is also the same as the address of its first non-static member.

In C, we have a guarantee that we can read inactive union members of a compatible type to the last one written. In C++, we do not. There are a few special cases where it works, such as pointers containing addresses of objects of the same type, signed and unsigned integral types of the same width, and layout-compatible structures. However, the types you used in your example have some extra guarantees: if they exist at all, uint16_t and uint32_t have exact widths and no padding, each object representation is a unique value, and all array elements are contiguous in memory, so any object representation of a uint32_t is also a valid object representation of some uint16_t[2] even though this object representation is technically undefined. What you get depends on endianness. (If you actually want to split up 32 bits safely, you can use bit shifts and bitmasks.)

To generalize, if the source object is_pod, then it can be copied strictly by its object representation and laid over another layout-compatible object at the new address, and if the destination object is the same size and has_unique_object_representations, it is trivially-copyable as well and will not throw away any of the bits—however, there might be a trap representation. If your union is not trivial, you need to delete the active member (only one member of a non-trivial union can have a non-trivial default constructor, and it will be active by default) and use placement new to make the target member active.

Whenever you copy arrays in C or C++, you always want to check for buffer overflow. In this case, you took my suggestion and used static_assert(). This has no run-time overhead. You can also use memcpy_s(): memcpy_s( &u, sizeof(u), &u32, sizeof(u32) ); will work if the source and destination are POD (trivially-copyable with standard layout) and if the union has standard layout. It will never overflow or underflow a union. It will pad out any remaining bytes of the union with zeroes, which can make a lot of the bugs you’re worried about visible and reproducible.

Tremayne answered 29/9, 2016 at 13:39 Comment(62)
At memcpy(u.a16 the OP takes the address of an inactive union member. Is there undefined behavior when memcpy writes to it?Antihelix
@Yakk Yes, according to [basic.life]/7.1, because it is accessing the value of an object outside its lifetime.Brigandage
Consider a simple assignment to a union member u.u32_in_a_union = 3;. This is obviously well-defined. This also appear to be an 'access' in my mind. So, I guess we need a better definition of 'access'. If memcpy treats its first operand in the same way as = treats its left-hand-side, then we're good.Phosphatase
@Yakk No, and that example is, as I wrote, taken from the standard itself: eel.is/c++draft/class.union#6 See eel.is/c++draft/basic.life#6Tremayne
@bogdan, in that definition of 'access', is there a distinction between two different "kinds" of 'modify'? Consider += which updates a value based on it's current state, and = which (for simple types) just blindly writes to the bits without considering whatever is currently presentPhosphatase
@AaronMcDaid The standard makes a special exception for assigning to an inactive union member: eel.is/c++draft/class.union#5 Otherwise, its storage is considered allocated, but its lifetime has not begun: eel.is/c++draft/basic.life#6 It’s as if you got a block of memory from malloc(), except that if the active member has a non-trivial destructor, you must call that first.Tremayne
@lorehead I see no use of memcpy in the links you provided me, just placement new. The argument that if placement-new is valid so is memcpy is supported where?Antihelix
@Yakk The relevant part of eel.is/c++draft/basic.life#6 “Before the lifetime of an object has started but after the storage which the object will occupy has been allocated [...] any pointer that represents the address of the storage location where the object will be or was located may be used, but only in limited ways. [... U]sing the pointer as if the pointer were of type void*, is well-defined.” The first argument of memcpy() has type void*.Tremayne
@lorehead and memcpy's first argument is a void* which must be to allocated storage, and it begins the lifetime of an object there under certain conditions (including, but not limited to, it copying plain old data from the source to the destination, with the storage being large enough and aligned enough).Antihelix
@Yakk Sorry, I thought you were referring to my placement new example at first, but you were referring to the code fragment in the OP.Tremayne
@Yakk I agree, but I’m not sure whether you are disagreeing with me. The same section says, “such a pointer refers to allocated storage,” unless it is currently under construction or destruction.Tremayne
"In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended" then gives you something? Really there seems to be a modest contradiction, because there isn't anything about placement new/memcpy ending the lifetime/activity of alternative objects, and an axiom of unions is that only one can be active. Either activating was UB, or the other object had its lifetime ended... You can manually end the lifetime of the other object (.~T()), you can assign and thus end it ([class.union/5]) clearly.Antihelix
@Yakk Yes there is: eel.is/c++draft/basic.life#5 I’ll add a reference to explain why what I said about trivial and non-trivial unions is true.Tremayne
Expanded for clarity.Tremayne
@Yakk "it begins the lifetime of an object there under certain conditions" source?Digitate
@AaronMcDaid u.u32_in_a_union = 3; is well-defined because the LHS uses a construct that [class.union]/5 specifies as starting the lifetime of the union member (if it's not active already). Only a very specific set of expressions do that; the ones used by memcpy are not in that set - see my comments on Yakk's answer.Brigandage
@AaronMcDaid Regarding access: it means read or modify the stored value of an object, that's all the definition says. += involves two accesses to the LHS: a read and a (subsequent) modify. It doesn't matter if it writes "blindly" or not; an access is an access, and if it happens outside of the lifetime of an object it is undefined behaviour.Brigandage
@Digitate eel.is/c++draft/basic.types#3 But [basic.life] does seem to say that even trivially-constructible union members do not begin their lifetime when initialized. If you want to be absolutely safe, you can explicitly make the member you want to copy to active first with either of the methods in [class.union], then memcpy() over the active member.Tremayne
@Lorehead if two pointers to T point to distinct T objects obj1 and obj2 - that's a precondition in that paragraph. obj2 needs to exist before you can memcpy into it; if its lifetime hasn't started, there's no object. The paragraph talks about changing an existing object's value, not about starting an object's lifetime.Brigandage
@Brigandage So assign a value to u.a16 or initialize it with placement new first. That explicitly starts its lifetime. I’ve edited a correction. Thanks for pointing this out.Tremayne
@Brigandage And that section does indeed say that mempy() starts an object’s lifetime “under certain conditions,” just not under these conditions (which Yakk did not claim).Tremayne
@Lorehead Under what conditions exactly does it say that memcpy starts an object's lifetime?Brigandage
@Brigandage eel.is/c++draft/basic.life#8 (Note that “A program may end the lifetime of any object by reusing the storage which the object occupies [....] For an object of a class type with a non-trivial destructor, the program is not required to call the destructor explicitly before the storage which the object occupies is reused or released[.]”)Tremayne
@Lorehead I don't see the connection to memcpy in that paragraph, sorry, please clarify. Storage is reused by another object being created in that storage. [intro.object]/1 specifies the ways in which objects are created. I don't see how memcpy can do any of that. In order to talk about an object's lifetime, you need to have an object to talk about in the first place.Brigandage
@Brigandage Would you agree that, under [basic.life].1, if you allocate storage with malloc() and initialize a trivially-copyable type with non-vacuous initialization using memcpy(), its lifetime begins then?Tremayne
@Lorehead In C++, no; no lifetime of any object will begin by doing that. malloc doesn't create an object, so passing the pointer returned by malloc directly as the destination argument to memcpy is undefined behaviour, because the pointer doesn't point to any object, and memcpy requires an object to be there - see my comments to Yakk's answer for details. You'd need to use placement new on the pointer returned by malloc to create an object in that storage, and then you can write that object's value through memcpy, by giving memcpy the pointer returned by placement new.Brigandage
@Brigandage Please re-read what it says about classes with vacuous initialization? Anyway, I’m bowing out, so thank you for helping to improve my understanding.Tremayne
Just to take a step back to earlier comments about 'access'. @bogdan, you're saying that neither placement-new nor = involve 'access', which is precisely why they can be used on inactive members?Phosphatase
@AaronMcDaid I can’t speak for @bogdan, but that’s not correct. Not all access to objects whose lifetime has not begun is UB. Inactive union members can be assigned to, making them active, but not necessarily from (this is UB in C++, although most compilers allow it, but they can be in standard C) because there’s a special rule saying that works. Taking the address and using it in limited ways, such as passing it as a void*, is also explicitly legal.Tremayne
Here’s one way to think about it. Let’s say a compiler writer decided, “I’ll add an extension that allows multiple non-trivial members in a union, tracks which member is active, and makes it work automagically.” Does that extension work on conformant code? I think so. A member can only become active in the specific ways from [class.union], so the magic bits get set when you initialize it, assign to it, or use placement new. But it wouldn’t know when you change the bits out from under it. So that’s a reason a compiler might make you activate first.Tremayne
@AaronMcDaid new is one of the ways to create objects specified in the standard; it can be used on a non-active member because it creates a new object in the storage associated with that member and starts its lifetime; accessing that object's value is subsequently valid (until its lifetime ends, of course). The same goes for assignment under the very specific conditions listed in [class.union]/5; a new object will be implicitly created before any access occurs, which avoids undefined behaviour.Brigandage
@AaronMcDaid Which means I have to object to something that Lorehead said above: Not all access to objects whose lifetime has not begun is UB. Actually, as far as I can tell, any access to an object outside its lifetime is UB. Note that taking an object's address is not an access. Funnily enough, neither is a class member access expression by itself, even though it contains the word "access" in its name (yes, I know; don't shoot the messenger :-) ).Brigandage
@Brigandage Would you agree with this summary of our positions? I think that overwriting a trivially-copyable object with memcpy() is “reusing its storage,” which ends the lifetime of what used to be there, initializes a new object in the same storage, and starts the lifetime of the new object ([basic.life]), whereas you think those are, as lawyers put it, magic words. That is, you think the standard has to use the exact same words each time or it’s something else, including “object” and “reusing,” and synonyms are not good enough, so every real-world use of malloc() is, in your view, UB.Tremayne
@Brigandage I don’t see why you’re taking the position that any “access” to an object outside its lifetime is UB, when this requires you to interpret “class member access” in [class.union].5 as not an access. In any case, if taking the address is not access, then the issue is irrelevant to taking the address and passing it to memcpy(). I am glad you’ve come around.Tremayne
Let's say I replace my union member uint32_t elem1; with a single-element wrapper struct Single<uint32_t> elem1; and then implement a constructor for Single<T> which actually just uses memcpy to do the construction. That then means I could then call placement-new as a thin wrapper around my call memcpy.Phosphatase
@AaronMcDaid Or use placement new with the default copy constructor?Tremayne
@Lorehead The correct summary of "my position" is in my first three comments to Yakk's answer. On your side, the standard tells you that there are exactly four ways to create an object, but you insist there's a fifth one (memcpy). [basic.life]/7 tells you that The program has undefined behavior if the glvalue is used to access the object, but you insist there are exceptions to that rule, even though the text doesn't list any. And now you think that a class member access expression has to access an object, even though the respective definitions are pretty clear in the standard text.Brigandage
@Lorehead Regarding passing the pointer to memcpy, I'm afraid you have nothing to be glad about. Again, my comments to Yakk's answer explain why.Brigandage
@Brigandage So, let me see if I understand this. If I have a trivially-copyable object T x;, I declare T* const p = static_cast<T*>(malloc(sizeof(x))); I check for error, and then I call memcpy( p, &x, sizeof(x) ); your view is that this is UB. In your opinion, none of these steps ever create an “object,” but the copying “accesses” the “object,” which is not an “object?”Tremayne
@Lorehead Yes, that's undefined behaviour according to the current standard wording; the fact that implementations generally let you get away with it doesn't make it less so (we're talking about the standard here). And it's not just "my view"; you can find several interesting (hopefully eye-opening) threads on related issues. For example: groups.google.com/a/isocpp.org/d/msg/std-discussion/p4BXNhTHY7U/… (keep in mind that, at the time, P0137 was not integrated into the working draft yet).Brigandage
@Brigandage Logically, that interpretation requires “object” to be a magic word in the one line that talks about their creation, but not in any of the others that do call storage an “object” and say that allocation plus initialization starts the lifetime of an “object.” Then, it has to be not-magic again in the line about “accessing” an “object,” when you believe there is no “object,” and where does the standard say memcpy() “accesses” anything? You don’t even accept that the literal word “access” qualifies! Pragmatically, no implementation will ever break malloc() that way.Tremayne
Let us continue this discussion in chat.Brigandage
@Lorehead, "... default copy constructor ..." In the general case I'd like to constructor these objects from other types, via memcpy/memmove, hence the copy constructor isn't sufficient. Hence my idea of a constructor (allowing us to placement-new), but with memcpy actually used to implement the constructorPhosphatase
@AaronMcDaid Yes, I’m sure that works under any interpretation of the standard.Tremayne
"There is a special rule that assigning to a union member activates it" Unions are not supported in strict std C++. The union member is created too late, the lvalue that refers to it can't refer to it before it's created.Sociolinguistics
@Tremayne Any consistent interpretation of the std need to dismiss the definition of lifetime or lvalue.Sociolinguistics
@Sociolinguistics Looking back at this a couple of years later: wow, I got snippy. If we need to choose which sections of the document we’re going to selectively ignore, I don’t think I’d use the word “consistent” to describe it.Tremayne
@Tremayne Of course different people are going to end up with different semantics for C++. But each group (or church?) can choose a subset, make up additional rules and interpretation that form a consistent semantics. At least we have to admit that and agree to disagree on what C++ semantics should be.Sociolinguistics
@Sociolinguistics I think some of the opinions expressed in the comments were not particularly relevant to writing portable C++ code in the real world. But language-lawyering is fun, and I engage in it too.Tremayne
@Tremayne In the real world, you need to ask compiler writers what they believe the C++ means. "We believe what is plainly written" is NOT a valid answer.Sociolinguistics
@Sociolinguistics I’ve posted C++ code here that type-puns between uint32_t and float through an inactive union member, in code that inherently caused undefined behavior anyway. It was controversial! However, all mainstream compilers support it, it’s difficult for me to imagine a mainstream compiler ever silently breaking such a common idiom. Historically, whenever the Standard has been in conflict with actual practice (void main, #include <iostream.h>, and so on), it’s the Standard that’s yielded.Tremayne
@Sociolinguistics The argument against was correct, though: if you want to avoid undefined behavior in a C++14 compiler, you would use memcpy() instead.Tremayne
@Davislor: If the Standard says something is Undefined or Unspecified, and an implementation chooses to specify it, the Standard isn't "yielding". When the Standard says "An implementation isn't required to specify what it does in some case, but may do so if it likes", that is in no way contradicted by an implementation that does, in fact, specify the behavior.Bluestone
@Bluestone I don’t have a lot of time to reopen this discussion today. However: whenever common practice has violated the Standard in the past, and it could be allowed without breaking anything, the Standard has always been rewritten to allow it. A notorious example: #include <iostream.h>.Tremayne
@Davislor: Saying "the Standard has always been written to allow it" is simply wrong. C89 defined the behavior of -1<<1 unambiguously on systems whose integer types have no padding bits. C99 made it UB. Compilers had usefully supported allowed cross-type structure access using members of a Common Initial Sequence since 1974, but the Standard didn't require that compilers do so and consequently gcc and clang don't. There are zero practical C99 implementations that don't use two's-complement integers, but the Standard still requires "portable" code to work with other types.Bluestone
@Bluestone Access to members of layout-compatible common initial sequences of structures is in fact guaranteed by the C standard. Every implementation of network sockets relies on it. I don’t have C89 in front of me, but -1<<1 could never possibly have been compatible between a two’s-complement, one’s-complement and sign-and-magnitude machine, all of which existed in ’89. But, fine, I concede that not literally every common shorthand has made it into the Standard.Tremayne
@Davislor: Neither gcc and clang acknowledge the CIS guarantee in cases where a members are accessed using pointers to their respective types, even in cases where each pointer is formed after the last conflicting access using a different type, which would be the most common scenario where the CIS rule is useful. The C89-mandated behavior of -1<<1 was different on ones'-complement machines and two's-complement machines, but the only room for ambiguity would be on platforms where signed and unsigned types had different padding bits. C99 makes it UB on all machines.Bluestone
@Bluestone I’m not sure what you’re referring to, but that sounds like a violation of the strict aliasing rules? In any case, the socket interface explicitly passes pointers to layout-compatible structures, casting them to sockaddr*, and depends on this working.Tremayne
@Davislor: Both gcc and clang interpret the rules in a way that requires CIS accesses to be performed "directly" through lvalues of union type, but that was never the normal usage pattern for situations involving the CIS rule (I've never seen network socket code do that, for example). Nothing I can see in the Standard distinguishes StructMemberType v = someUnion.unionMemner.structMember; from StructType *p = &someUnion.unionMemner; StructMemberType v = p->structMember;, Support for either is a Quality of Implementation issue, but IMHO quality compilers should support both.Bluestone
@Bluestone This is getting off-topic, and I’m not clear what you’re talking about.Tremayne
@Davislor: My point is that the range of situations where gcc and clang only reliably support the CIS guarantees in a small fraction of the situations where code (including socket code) would rely upon it. The fact that C17 didn't add a requirement to support such code suggests that, , at minimum, a large enough minority of Committee members regard the behavior as conforming to prevent passing any rule to make clear that it isn't. The lack of a new rule, however, will almost be taken by the authors of gcc and clang as their behavior, and a condemnation of any code that is broken thereby.Bluestone
Let us continue this discussion in chat.Tremayne
A
2

[class.union]/5:

In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended ([basic.life]). At most one of the non-static data members of an object of union type can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

At most one member of a union can be active at any one time.

An active member is one whose lifetime has begun and not ended.

Thus, if you end the lifetime of a member of your union, it is no longer active.

If you have no active members, causing the lifetime of another member of the union to begin is well-defined under the standard, and causes it to become active.

The union has allocated storage sufficient for all of its members. They all are allocated as if they where alone, and they are pointer-interconvertible. [class.union]/2.

[basic.life]/6

Before the lifetime of an object has started but after the storage which the object will occupy has been allocated40 or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways. For an object under construction or destruction, see [class.cdtor]. Otherwise, such a pointer refers to allocated storage ([basic.stc.dynamic.deallocation]), and using the pointer as if the pointer were of type void*, is well-defined.

So you can take a pointer to a union member and treat it as a pointer to allocated storage. Such a pointer may be used to construct an object there, if such a construction is legal.

Placement new is a valid way to construct an object there. memcpy of trivially copyable types (including POD types) is a valid way to construct an object there.

But, constructing an object there is only valid if it does not violate the rule of there being one active member of the union.

If you assign to a member of a union under certain conditions [class.union]/6 it first ends the lifetime of the currently active member, then starts the lifetime of the assigned-to member. So your u.u32_in_a_union = 0xaaaabbbb; is legal even if there is another member active in the union (and it makes u32_in_a_union active).

This isn't the case with placement new or memcpy, there is no explicit "the lifetime of the active member end" in the union specification. We must look elsewhere:

[basic.life]/5

A program may end the lifetime of any object by reusing the storage which the object occupies or by explicitly calling the destructor for an object of a class type with a non-trivial destructor.

The question is, is starting the lifetime of a different member of the union "reusing the storage", thus ending the other union members lifetime? In practice, obviously (they are pointer-interconvertable, they share the same address, etc). [class.union]/2.

So I would argue yes.

So creating another object through a void* pointer (placement new, or memcpy if legal for the type) ends the lifetime of the alternative members of the union (if any) (not calling their destructor, but that is usually ok), and makes the pointed-to object active and alive, at once.

It is legal to begin the lifetime of a double or an array of int16_t or similar via memcpy over storage.

The legality of copying an array of two uint16_t over an uint32_t or vice versa I will leave to others to argue. Apparently it is legal in C++17. But this object being a union has nothing to do with that legality.


This answer is based off of discussion with @Lorehead below their answer. I felt I should provide an answer that aims directly at I think the core of the problem.

Antihelix answered 29/9, 2016 at 21:20 Comment(18)
memcpy of trivially copyable types (including POD types) is a valid way to construct an object there - I disagree. memcpy takes void*, yes, but those pointers need to point to objects - see 7.24.2.1 in N1570 (C11 draft): The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. 7.24.1/3 also also specifies that it accesses those objects through lvalues of type unsigned char. It changes the value of the destination object, but it doesn't start the lifetime of any object.Brigandage
I can't find any paragraph in the C++ standard that says that memcpy starts the lifetime of any object; only that you can use it to access the value of existing objects within their lifetime (otherwise it would go against [basic.life]/7.1). [intro.object]/1 gives a clear, exhaustive list of ways to start the lifetime of an object; memcpy is not among them.Brigandage
[class.union]/5 defines the constructs that can be used to start the lifetime of a union member through an assignment; assigning through an lvalue obtained from indirection through a pointer to unsigned char, which is what memcpy is specified to do, is not among them.Brigandage
What @Brigandage said. memcpy doesn't start the lifetime of anything, because it cannot create an object.Sweetandsour
Forgetting about unions entirely, what about int i =3; int j; memcpy(&j, &i, sizeof(i))? Assuming this is legal (is it?), when does the lifetime of j start? Is there an "object" at j? If so, when? What are the various stages in the lifetime of an int such as j? (allocation ... birth ... initialization ... assignment)Phosphatase
@aaron the object lifetime there starts at int j;: its state is unspecified. More interesting is std::aligned_storage_t< sizeof(int), alignof(int )> b; memcpy(&b,i,sizeof(i)); -- is there an int in *(int*)&b? Looks like the answer may be "no", as the only ways to begin lifetime appear to be declairing a non-union variable of that type, new (placement or not), or certain operations (like assignment) on a pod-like union member field. Oh and arguable memcpy of an entire union may set the living object in the destination. memcpy on a member of a union is none of these.Antihelix
Thanks, @Yakk, that makes a lot of sense to me. And it helps to clarify a lot of the vague (incorrect) assumptions that I had taken for granted before asking this question.Phosphatase
@aaron the remaining pissibility involves layout-compatibility of int with union{int x};, legality of beginning the lifetime of a union field with memcpy from one union to another, and casting an int* to a pointer to a layout compatible type then doing the memcpy. I am uncertain if that can be made to work.Antihelix
The question about setting the active member by memcpying an entire union object into another object of the same type is a very good one. I'm leaning towards "no" (don't hate me :-) ). "Yes" would mean that memcpy would be able to create new objects, as changing the active member involves creating a new (sub)object - as far as I can tell, there's currently no way to start the lifetime of an object other than creating one. I'd say the result is similar to memcpying an int into a float: the float may end up containing the int's object representation, but this won't create a...Brigandage
... new int object in there or somehow change the float object's type. Anyway, I'd say [basic.types]/2 or 3 would benefit from some text about unions and active members. cc @Sweetandsour for his opinion.Brigandage
@Brigandage the consequences are less important than the standard text. Does memcpy let you move everything about a plain old data object's "state" around or not? I think so. Is the active member part of that state? I think it is insane if not. Can you reinterpret an int* to a union{int x;}* and access the x legally? I do not know, but I suspect so.Antihelix
If you're going to quote from draft versions of the standard, please can you make that fact explicit, and also say which draft you're using? In particular, relying on eel.is is risky because it gets updated automatically so the text may change from day to day.Delgado
@Yakk Everything about an object's state? I'm not sure about that everything. The object representation is an attribute of the object, not the other way around (two objects of different trivially copyable types may have the same object representation and yet be very different objects). Storing an object's state somewhere doesn't mean another object suddenly appears there, and that applies to its subobjects as well. Since this clearly needs clarification, I've started a thread on std-discussion.Brigandage
@Yakk I don't think the int to union thing is safe. You're accessing the stored value through an lvalue of type int in both cases, so [basic.lval]/8 is satisfied. However, an object cannot be a complete object and a subobject at the same time ([intro.object]/5), so, if the compiler can trace the int* back to the original complete int object, it can conclude that the original and the union member cannot be alive at the same time, so up->x cannot alias the original int, and optimize based on that.Brigandage
"that memcpy would be able to create new objects" Any semantics of C++ where it doesn't, and where the memory is not covered with uninitialized objects, would be a breaking change from C/C++ tradition that it would represent a betrayal of the charter of the C++ committee so bad that it can be summarily dismissed. Also, any such interpretation is a direct violation of the axiom "an lvalue must refer to an object" which is part of the std.Sociolinguistics
Two members of the same union cannot be pointer-interconvertible because only objects can be so, but they cannot both exist as an object at the same time. They are, during their life time, pointer-interconvertible with the union object, nothing more.Crowder
@JMC: That problem could be fixed by saying that every region of storage that does not contain any non-PODS objects simultaneously contains PODS of every type that will fit, whose lifetime matches that of the storage in question, but that such objects are not always accessible. One could then say that taking the address of a union will make that object accessible, and render all other union objects inaccessible for writing. Writing to a union member via any means would render all other objects inaccessible. Using memcpy to copy an object would cause all objects in the new copy...Bluestone
...to be initially accessible for reading or writing, but reading any object would make all others inaccessible for writing, and writing any member would make all others inaccessible.Bluestone
S
0

The elephant in the room: unions are not supported at all in complete strict C++, the "language" that you get when you try to apply all the standard clauses of the failed attempt at formalizing the intuition of C++ called the standard.

This is because:

  • an lvalue refers to an object,
  • a member access (x.m) is a normal lvalue for any class or union,
  • all members of a live class or union can be designated at any time by a member access,
  • according to the strict lifetime rules, only one member object can be alive in a union,
  • the notion of an lvalue referring to a soon to be created object is not defined in the standard.

So a simple use of a union like:

union {
  char c;
  int i;
} u;

u.i = 1;

has no defined behavior because the result of the evaluation of u.i can't refer to any int object, as there is no such object at the time of evaluation.

The C++ committee failed at its mission.

In fact nobody uses complete strict C++ for any purpose, people need to dismiss whole parts of the standard or make up whole imaginary clauses inspired by the written text, or go back from the text to the intent they imagine, then re-formalize the intent, to make sense of it.

Different people dismiss different parts and end up with complete different formalismes.

My proposal is to dismiss the lifetime rules and have an object at any address that can possibly hold such object. That solves the whole issue and nobody has ever presented a valid objection to the approach (vague assertions that "this breaks all invariants" isn't a valid objection). Having an object at any valid address just creates an infinite number of potential objects (notably all pointer types, int*, int**, int***...) but these are not usable for reading as no valid value has been written.

Note that without that relaxation of either the lifetime rule or the definition of lvalues, you can't even have a non trivial "strict aliasing rule" as that rule wouldn't apply to a well defined program without that rules. As currently interpreted, the "strict aliasing rule" is useless. (Also it's so badly written nobody knows what it means anyway.)

Or maybe someone will tell me that to make sense of the strict aliasing rule, an lvalue of int refers to an object, just of a different type. That would be so surprising and silly that even if you make a consistent interpretation of the standard that way, I would still say it's broken.

Sociolinguistics answered 16/7, 2018 at 14:6 Comment(1)
I think it would be more fair to say that the authors of the Standard didn't think it necessary to forbid every silly thing a low-quality implementation could do to break code that should work predictably. I think the concept of a single "active member" is broken, and what's needed is instead something closer to a reader/writer lock. Referencing the union as a whole (including as a member-access lvalue) is the only action which releases locks. Reading or writing members via any means requires acquisition of "reader" or "writer" locks. Conflicting lock acquisition invokes UB.Bluestone

© 2022 - 2024 — McMap. All rights reserved.