Important clarification: some commenters seem to think that I am copying from a union. Look carefully at the memcpy
, it copies from the address of a plain old uint32_t
, which is not contained within a union. Also, I am copying (via memcpy
) to a specific member of a union (u.a16
or &u.x_in_a_union
, not directly to the entire union itself (&u
)
C++ is quite strict about unions - you should read from a member only if that was the last member that was written to:
9.5 Unions [class.union] [[c++11]] In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.
(Of course, the compiler doesn't track which member is active. It's up to the developer to ensure they track this themselves)
Update: This following block of code is the main question, directly reflecting the text in the question title. If this code is OK, I have a follow up regarding other types, but I now realize that this first block of code is interesting itself.
#include <cstdint>
uint32_t x = 0x12345678;
union {
double whatever;
uint32_t x_in_a_union; // same type as x
} u;
u.whatever = 3.14;
u.x_in_a_union = x; // surely this is OK, despite involving the inactive member?
std::cout << u.x_in_a_union;
u.whatever = 3.14; // make the double 'active' again
memcpy(&u.x_in_a_union, &x); // same types, so should be OK?
std::cout << u.x_in_a_union; // OK here? What's the active member?
The block of code immediately above this is probably the main issue in the comments and answers. In hindsight, I didn't need to mix types in this question! Basically, is u.a = b
the same as memcpy(&u.a,&b, sizeof(b))
, assuming the types are identical?
First, a relatively simple memcpy
allowing us to read a uint32_t
as an array of uint16_t
:
#include <cstdint> # to ensure we have standard versions of these two types
uint32_t x = 0x12345678;
uint16_t a16[2];
static_assert(sizeof(x) == sizeof(a16), "");
std:: memcpy(a16, &x, sizeof(x));
The precise behaviour depends on the endianness of your platform, and you must beware of trap representations and so on. But it is generally agreed here (I think? Feedback appreciated!) that, with care to avoid problematic values, the above code can be perfectly standards-complaint in the right context on the right platform.
(If you have a problem with the above code, please comment or edit the question accordingly. I want to be sure we have a non-controversial version of the above before proceeding to the "interesting" code below.)
If, and only if, both blocks of code above are not-UB, then I would like to combine them as follows:
uint32_t x = 0x12345678;
union {
double whatever;
uint16_t a16[2];
} u;
u.whatever = 3.14; // sets the 'active' member
static_assert(sizeof(u.a16) == sizeof(x)); //any other checks I should do?
std:: memcpy(u.a16, &x, sizeof(x));
// what is the 'active member' of u now, after the memcpy?
cout << u.a16[0] << ' ' << u.a16[1] << endl; // i.e. is this OK?
Which member of the union, u.whatever
or u.a16
, is the 'active member'?
Finally, my own guess is that the reason why we care about this, in practice, is that an optimizing compiler might fail to notice that the memcpy
happened and therefore make false assumptions (but allowable assumptions, by the standard) about which member is active and which data types are 'active', therefore leading to mistakes around aliasing. The compiler might reorder the memcpy
in strange ways. Is this an appropriate summary of why we care about this?
memcpy
a union due to potentially reading uninitialised memory. See #33394069, although that's on the C tag. – Grammestd::copy
instead. – Nightlongmemcpy
here to read from initialized memory, so it should be OK. I'll edit the question now to make clear thatu32
is still present, and initialized. Thanks for all these comments, it's helping me to clean up the question! – Phosphatasestd::copy
doesn't seem to help here, because the types of the args don't match. That's why a 'raw'memcpy
(ormemmove
) is required. Does this make sense? – Phosphataseu.a16[0] = 0;
also "accesses" the inactive union member ; but it's OK. Surely assignment-into (and maybe therefore,memcpy
-into?) is an acceptable time to use an inactive union member – Phosphatasememcpy
. But I don't find anything wrong in the example sincememcpy
is done on valid memory(by valid I means properly allocated and same sized). Person doingmemcpy
should be well aware of the values inside the source from which memory is copied. – Greenhornu.a16
, not&u
, and then I want to read directly fromu.a16
- hence I don't care whereu.a16
is within the union – Phosphatasememcpy
is not a problem here. But Memory Layout does has relevance otherwise modifying one member ofunion
won't make other members meaningless. Predicting/or knowing exactly which is the active member is not possible unless some bookkeeping is done. – Greenhorn=
(in certain cases).memcpy
doesn't cut it. On the other hand, it arguably reuses the storage and end the lifetime of thedouble
, in which case you'd have a union without an active member. – Sweetandsourmemcpy
for changing the active union member here. It doesn't say you can't usememcpy
for that purpose, and Core Issue 1116 proposed resolution 1 appears to support that ("ifT
is trivially copyable…"). If you think PO137R1 does ban it, maybe that's something that needs to be addressed by fixing P0137R1? – Delgadomemcpy
doesn't create objects ([intro.object]/1 exhaustively enumerates how objects can be created, andmemcpy
is not one of them), therefore it cannot begin the lifetime of objects. – Sweetandsourwhatever
and doesn't create a new object, what are the implications of that? Is that something the optimizer could start doing weird things to? – Digitatememcpy
is a pretty messy area. We need something along the lines of N3751, but getting it right is hard. As to the implications under the current standard, per [basic.life], accessing an object outside its lifetime results in undefined behavior. – Sweetandsourmemcpy()
from auint32_t
to an array ofuint16_t
is valid, your final example is valid,whatever
is garbage, anda16
is the active member. Since this was taggedlanguage_lawyer
, I pointed out that this is not portable code and went into some reasons it might fail. My apologies that several people thought that was unhelpful. – Tremaynechar
, the right size or usingmemcpy_s()
, does not involve any UB, if anyone but me cares about that, and seems to illustrate your point equally well? – Tremayneunion
s, using memcpy betweenuint32_t
anduint16_t
is UB, plain and simple? On all platforms? I guess the standard says nothing like: "if the platform satisfies properties X,Y, and Z, thenmemcpy
to anint32_t
from an arrayint16_t[2]
is defined". And therefore the first code example in the question is undefined – Phosphataseuint32_t
touint16_t
appears safe for your purposes: arrays must be laid out contiguously in memory, and both those types must have exact widths with no padding. They are also trivially copyable. Caveats you already acknowledged but said were not relevant to your question: endianness, implementations where those types might not exist. I still strongly recommend you always check for buffer overruns! The declarations might change due to bit rot. – Tremaynestd:: is_scalar
? Or maybestd::is_trivial
? – Phosphatasestd::has_unique_object_representations
. This is true ifstd::is_trivially_copyable
is true and if every equivalent object has a unique representation, i.e., no padding. For the source, you might wantstd::is_pod
(Plain Old Data). – Tremaynenew
, and the standard says you can copy over it withmemcpy()
. Then, whethermemcpy()
activates it or not, it will be active and hold the correct value. – Tremaynememmove
from another member of the same union. However, if I did a placement-new
first it would overwrite the data I want to copy from. Anyway, thanks @Lorehead and everyone else. I'll keep checking this for a few days, I'm learning a lot about many things! – Phosphataseasm("");
and voila, all potential objects appear at all places. – Sociolinguisticsasm
and your code is inherently not meant to be portable. – Tremayneasm("");
? – Sociolinguisticsasm
extension is different. As soon as you use anasm
extension, you’re inherently targeting a single compiler (or implementations that try to be perfectly compatible with it). A compiler that goes out of its way to be compatible with gcc’s inline assembly will also be compatible with other gcc extensions. – Tremayneasm("");
? – Sociolinguisticsasm("");
with no actualasm
statement, that’s completely undefined by the Standard and compilers do a lot of different things. If you’re not making a wholly-theoretical point and actually thinking of using anasm
statement, anything that works in one compiler for one target will not work in others. Either way, it’s pointless to worry about making code that containsasm
portable. – Tremayne