Type punning between integer and array using `union`?
Asked Answered
D

3

3

Is it legal to do a type-punning between an integer and an array of integers?

Specific code:

#include <nmmintrin.h>
#include <stdint.h>

union   Uint128 {
    __uint128_t uu128;
    uint64_t    uu64[2];
};

static inline   uint_fast8_t    popcnt_u128 (__uint128_t n)
{
    const union Uint128 n_u     = {.uu128 = n};
    const uint_fast8_t  cnt_a   = _mm_popcnt_u64(n_u.uu64[0]);
    const uint_fast8_t  cnt_b   = _mm_popcnt_u64(n_u.uu64[1]);
    const uint_fast8_t  cnt     = cnt_a + cnt_b;

    return  cnt;
}
Disqualification answered 5/3, 2019 at 20:11 Comment(2)
Yes, but be aware of endianness.Fifield
I thought about that, but as I'm only counting bits, I don't care.Disqualification
R
0

The type-access rules in section 6.5 paragraph C of C11 draft N1570 make no provision for an object of struct or union type to have its storage accessed by anything other than an lvalue of that union type, an lvalue of another type that contains such a union, or an lvalue of character type.

A quality compiler that can see that a pointer or lvalue of one type is being used to derive a pointer or lvalue of another which is then accessed should be able to recognize that an access to the latter, made in a context where the derivation is visible, is an access to the former. I think the authors of the Standard thought that sufficiently obvious that it could go without saying, especially since even something like someUnion.intMember = 3; would invoke UB otherwise. The left-hand operand of the assignment is an lvalue of type int, and there is no provision that would allow an lvalue of type int to be used to access an object of union type. The range of situations where a compiler would recognize that an access via derived pointer or lvalue is an access to the parent is a Quality of Implementation issue; the Standard offers no guidance as to what should be expected from a "good" implementation.

As for what clang and gcc allow, they seem to recognize that an access to someUnion.someArray[i] is an access to the union, but they do not recognize *(someUnion.someArray+i) likewise, even though the Standard defines the two constructs as equivalent. Since the Standard doesn't require that implementations recognize either (nor even the obvious someUnion.intMember), the disparity does not make clang and gcc non-conforming. Nonetheless, it should be noted that they are astonishingly blind when it comes to recognizing lvalues based on unions.

Retardment answered 5/3, 2019 at 22:30 Comment(0)
B
5

Yes, type punning between all the data types through unions is explicitly foreseen by the C standard. There are no special provisions for arrays that would forbid that.

Brooking answered 5/3, 2019 at 20:15 Comment(1)
According to the Standard, someUnion.arr[i] is equivalent to *(someUnion.arr+i). Given that clang doesn't treat the latter as defined, is there any reason to trust it to process the former?Retardment
P
3

Yes, union type-punning is legal in ISO C99 and later. Unions and type-punning and also Is type-punning through a union unspecified in C99, and has it become specified in C11? (In C89 it was implementation defined, not undefined).

As a GNU extension, it's well-defined in gnu89 and GNU C++. https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type%2Dpunning

It's also legal in MSVC++, which for example uses unions to define __m128i for access to vector elements. (And also allows pointer-casting for type punning, unlike other compilers that enforce strict aliasing.)


Beware that it's not legal in ISO C++ to read a union member other than the one that was last written (undefined behaviour). It is a common extension that I think all x86 compilers support (and you're using Intel intrinsics), but not all compilers everywhere should be assumed to.

You can always use memcpy for strictly-portable C++ to copy between the object-representations of two different types.


For your case, any decent optimizing compiler should compile this the same as (uint64_t)n and n>>64, unless you disable optimization.

Pneumoconiosis answered 5/3, 2019 at 22:13 Comment(1)
"Beware that it's not legal in ISO C++ to read a union member other than the one that was last written (undefined behaviour)." This is not entirely true. It is legal to read from another union member than was last written to, even if they are of different type, as long as those members are part of the common initial sequence of said union.Chelsae
R
0

The type-access rules in section 6.5 paragraph C of C11 draft N1570 make no provision for an object of struct or union type to have its storage accessed by anything other than an lvalue of that union type, an lvalue of another type that contains such a union, or an lvalue of character type.

A quality compiler that can see that a pointer or lvalue of one type is being used to derive a pointer or lvalue of another which is then accessed should be able to recognize that an access to the latter, made in a context where the derivation is visible, is an access to the former. I think the authors of the Standard thought that sufficiently obvious that it could go without saying, especially since even something like someUnion.intMember = 3; would invoke UB otherwise. The left-hand operand of the assignment is an lvalue of type int, and there is no provision that would allow an lvalue of type int to be used to access an object of union type. The range of situations where a compiler would recognize that an access via derived pointer or lvalue is an access to the parent is a Quality of Implementation issue; the Standard offers no guidance as to what should be expected from a "good" implementation.

As for what clang and gcc allow, they seem to recognize that an access to someUnion.someArray[i] is an access to the union, but they do not recognize *(someUnion.someArray+i) likewise, even though the Standard defines the two constructs as equivalent. Since the Standard doesn't require that implementations recognize either (nor even the obvious someUnion.intMember), the disparity does not make clang and gcc non-conforming. Nonetheless, it should be noted that they are astonishingly blind when it comes to recognizing lvalues based on unions.

Retardment answered 5/3, 2019 at 22:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.