Can a C++ union member be activated by assignment with pointer-to-member syntax?
Asked Answered
Y

1

3

In C++, unions can only have zero or one active members at any given time, and the C++ standard provides few ways to make a member active. One such way is by direct assignment in a statement like u.x = 3;. Respecting these rules is required for working with unions in a constexpr or consteval context, as the compiler will reject code that violates object lifetimes in these cases.

But what if we want to active a union member by pointer-to-member syntax, with a statement like u.*member = 3;? I created this sample C++20 program to test with MSVC 19, Clang 16, and GCC 13:

union U
{
    int x;
    int y;
};

template<auto member = &U::x>
constexpr void activate(U& u) noexcept
{
    u.*member = 3;
}
constexpr int get(U& u) noexcept
{
    return u.x;
}
constexpr int test() noexcept
{
    U u;
    activate(u);
    return get(u);
}
static constexpr int const result{test()};
int main()
{
    return result;
}

On compiler explorer it appears that MSVC 19 and GCC 13 both accept the program and generate correct assembly for it. However, Clang 16 rejects it with this confusing error:

<source>:22:28: error: constexpr variable 'result' must be initialized by a constant expression
static constexpr int const result{test()};
                           ^     ~~~~~~~~
<source>:10:12: note: assignment to member 'x' of union with no active member is not allowed in a constant expression
        u.*member = 3;
                  ^
<source>:19:2: note: in call to 'activate(u)'
        activate(u);
        ^
<source>:22:35: note: in call to 'test()'
static constexpr int const result{test()};
                                  ^
1 error generated.

Also interesting to note, if we change &U::x to &U::y in the template parameter, MSVC 19 diagnoses the error as being due to the wrong union member being activated, but GCC 13 still accepts the program anyway and generates the same assembly output. If I change the type of y from int to char, then GCC 13 does diagnose the issue like MSVC 19 does. In any case, Clang 16 always seems unhappy with the pointer-to-member syntax. All three compilers accept all variations of the code when not in a constexpr or consteval context, it seems that the constexpr evaluation is where the compilers start to differ.

From my understanding, only MSVC or GCC is behaving correctly, while Clang is not handling the pointer-to-member syntax correctly. But is this accurate? Or does the C++ standard not have any provision for activating a union member with pointer-to-member syntax?

To put things simply, which compiler is correct here, if any?

Yawning answered 27/4, 2023 at 20:12 Comment(2)
A pointer to a union member looks like a rare kind of perversion. Why would one ever create such a thing?Masonry
@n.m. It's useful for templates to work with unions that have many different data members. For example, a union that can hold every kind of pointer to member function. We can't yet use byte arrays for storage in constexpr so we have to use unions to reuse space instead, as far as I am aware.Yawning
I
3

Activating a member of a union object is essentially equivalent to starting the lifetime of the member subobject. Lifetime of an object can be started explicitly with a placement-new or with certain operations that are specified to begin the lifetime of objects implicitly under certain conditions (e.g. implicitly-defined copy/move constructors and assignment operators of a union or functions like memcpy).

Additionally there is one special provision for unions that enables starting the lifetime of subobjects with a simple assignment expression under certain conditions, which normally has no such effect. This is specified in [class.union.general]/6. As you can see in the linked text of the passage from the post-C++20 standard draft N4868, only assignment expressions whose left-operand is formed purely by the built-in array subscripting operator and . member access are considered.

That means access via a pointer-to-member (or any kind of pointer or reference for that matter) cannot cause an assignment expression to start the lifetime of a subobject implicitly and hence cannot change the active member of the union object.

Clang is correct in rejecting it as core language undefined behavior in a manifestly constant-evaluated expression. Clang tends to be much more strict with UB in constant expressions and my impression is that GCC in particular is rather often accepting UB in constant expression that is more or less UB only because of restrictive wording in the standard.

Iodize answered 27/4, 2023 at 20:34 Comment(6)
And yet this compiles (and runs correctly).Bindman
@PaulSanders Yes, the expression activate (u) in main is not manifestly constant-evaluated, i.e. it doesn't appear in a context that mandatorily requires a constant expression. Therefore it doesn't matter whether or not it is one. In this case it isn't a constant expression for the reasons given in my answer. However, the program technically has undefined behavior, although I don't expect any compiler to mess with it in unexpected ways (hence my comment on GCC at the end of my answer). If you substitute constexpr with consteval you'll get the errorIodize
Right, got you. And yes, I tried consteval.Bindman
Placement new is not allowed in constant expressions though, right? It seems difficult to avoid code duplication with unions that have many members in constexpr contexts...Yawning
@Yawning In constant expressions you can still use std::construct_at. However there is some issue with deciding which int member actually would be activated since they have the same type. See e.g. CWG 2677. There normally isn't really any point in having two union members with the same type. There usually is also not much point to use raw unions. std::variant is type-safe and in most cases just as good, except if you want to manage the type index somewhere outside further away or implicitly.Iodize
@Iodize Good point, I need to revisit the code generation of std::variant to see if it's any better. Last time I checked it generated way too much code and exception stuff that wasn't able to be optimized out in my case. That's why I was using raw unions.Yawning

© 2022 - 2024 — McMap. All rights reserved.