Is it legal to use address of one field of a union to access another field?
Asked Answered
M

2

11

Consider following code:

union U
{
    int a;
    float b;
};

int main()
{
    U u;
    int *p = &u.a;
    *(float *)p = 1.0f; // <-- this line
}

We all know that addresses of union fields are usually same, but I'm not sure is it well-defined behavior to do something like this.

So, question is: Is it legal and well-defined behavior to cast and dereference a pointer to union field like in the code above?


P.S. I know that it's more C than C++, but I'm trying to understand if it's legal in C++, not C.

Marven answered 10/10, 2015 at 16:44 Comment(9)
Why would you? But I'm pretty sure it would be legal, since all union members start at the same address. (Otherwise, it wouldn't be a union anymore.)Balduin
it is legal but not recommendedHooch
As others have said, legal or otherwise, it is bad design! Semantically, A union should contain exactly one of its members. What you're trying to do sounds "clever", and you have to be twice as clever to fix a bug as you were when you created it. Don't be clever if there is another way.Opaque
How is it more C than C++? Unions exist in both languages and so do pointers.Typhogenic
@Balduin @Opaque Ohh, it's a long story. I'm trying to implement GLSL-style vectors. For them, I need a behavior like this: vec3 a(1,2,3); vec4 b = a.zxyy; // 3,1,2,2 To implement that behavior (.zxyy) i need the vector class to be a union. One of it's fields is a structure with x, y, z and w members. Other fields are letter combinations like zxyy. For each such field I need a separate (empty) type and a separate set of (macro-generated) overloaded operators.Marven
These overloaded operators shall somehow access x, y, z and w fields. The only way I see is to cast an address of such empty class to pointer to an entire vector and then use ->x, ->y, ->z, ->w on it.Marven
@Hooch Yes, I think it is, but I can't be sure without a reference from the standard...Marven
@ThomasMatthews I mean, it's more C-style than C++-style.Marven
Let's amend, first field, to access other first field. If there's type mismatch and e.g. this is a union of structs, third field of the first struct may be entirely elsewhere than the third field of the second struct.Damon
D
7

All members of a union must reside at the same address, that is guaranteed by the standard. What you are doing is indeed well-defined behavior, but it shall be noted that you cannot read from an inactive member of a union using the same approach.

Note: Do not use c-style casts, prefer reinterpret_cast in this case.


As long as all you do is write to the other data-member of the union, the behavior is well-defined; but as stated this changes which is considered to be the active member of the union; meaning that you can later only read from that you just wrote to.

union U {
    int a;
    float b;
};

int main () {
    U u;
    int *p = &u.a;
    reinterpret_cast<float*> (p) = 1.0f; // ok, well-defined
}

Note: There is an exception to the above rule when it comes to layout-compatible types.


The question can be rephrased into the following snippet which is semantically equivalent to a boiled down version of the "problem".

#include <type_traits>
#include <algorithm>
#include <cassert>

int main () {
  using union_storage_t = std::aligned_storage<
    std::max ( sizeof(int),   sizeof(float)),
    std::max (alignof(int),  alignof(float))
  >::type;

  union_storage_t u;

  int   * p1 = reinterpret_cast<  int*> (&u);
  float * p2 = reinterpret_cast<float*> (p1);
  float * p3 = reinterpret_cast<float*> (&u);

  assert (p2 == p3); // will never fire
}

What does the Standard (n3797) say?

9.5/1    Unions    [class.union]

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static dat amembers ca nbe stored in a union at any time. [...] The size of a union is sufficient to contain the largest of its non-static data members. Each non-static data member is allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.

Note: The wording in C++11 (n3337) was underspecified, even though the intent has always been that of C++14.

Dynameter answered 10/10, 2015 at 17:3 Comment(7)
I guess it's underspecified, see CWG 1116Susa
@KerrekSB I will change to make the post reference C++14, since the wording is more clear (but the intent has always been that wording - even in C++11).Albuquerque
Can you elaborate further on how writing to the reinterpret_cast'ed pointer doesn't violate the strict alias rules, regardless of what you can do with the union afterwards?Theriot
@MarkB since the data-members are to be placed at the same address, the reinterpret_cast is fine (since we can interpret this address as the start of an object of type T2 - even though it currently holds an object of type T1).Albuquerque
@MarkB see the added example.Albuquerque
"Note: There is an exception to the above rule when it comes to layout-compatible types." Where?Gaddy
If you meant the "standard-layout structs with common initial sequence" proviso, some people claim it only guarantees behaviour when the union is passed to the user (making its declaration visible) and punning is done by reading the struct initial members via union member accessors ('punning' u.structA.a and u.structB.b, not just structA and structB). I'm not convinced, but the wording is badly ambiguous. If there is a section that sets defined behaviour for reads/writes from/to different layout-compatible union members, treated as individual objects, please let me know.Gaddy
T
3

Yes, it is legal. Using explicit casts, you can do almost anything.

As other comments have stated, all members in a union start at the same address / location so casting a pointer to a different member is pointless.

The assembly language will be the same. You want to make the code easy to read so I don't recommend the practice. It is confusing and there is no benefit.

Also, I recommend a "type" field so that you know when the data is in float format versus int format.

Typhogenic answered 10/10, 2015 at 17:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.