Do array elements count as a common initial sequence?

Asked 17/3, 2016 at 3:20 Answered 26/8, 2022 at 18:44

Sort of related to my previous question:

Do elements of arrays count as a common initial sequence?

struct arr4 { int arr[4]; };
struct arr2 { int arr[2]; };

union U
{
    arr4 _arr4;
    arr2 _arr2;
};

U u;
u._arr4.arr[0] = 0; //write to active
u._arr2.arr[0]; //read from inactive

According to this cppreference page:

In a standard-layout union with an active member of non-union class type T1, it is permitted to read a non-static data member m of another union member of non-union class type T2 provided m is part of the common initial sequence of T1 and T2....

Would this be legal, or would it also be illegal type punning?

Rolanda answered 17/3, 2016 at 3:20 Comment(1)

Without any argument I believe it is legal. – Birthwort 18/3, 2016 at 9:39

C++11 says (9.2):

If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them. Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.

As to whether arrays of different size form a valid common initial sequence, 3.9 says:

If two types T1 and T2 are the same type, then T1 and T2 are layout-compatible types

These arrays are not the same type, so this doesn't apply. There is no special further exception for arrays, so the arrays may not be layout-compatible and do not form a common initial sequence.

In practice, though, I know of a compiler (GCC) which:

ignores the "common initial sequence" rule, and
allows type punning anyway, but only when accesses are "via the union type" (as in your example), in which case the "common initial sequence" rule is obeyed indirectly (because a "common initial sequence" implies a common initial layout on the architectures the compiler supports).

I suspect many other compilers take a similar approach. In your example, where you type-pun via the union object, such compilers will give you the expected result - reading from the inactive member should give you value written via the inactive member.

Thorough answered 18/3, 2016 at 9:32 Comment(7)

But the arrays are part of struct and therefore are members. – Birthwort 18/3, 2016 at 9:39

@Thorough Read the question - U contains one arr4 and one arr2, while these names may be confusing, these are defined as struct just above. – Natalia 18/3, 2016 at 9:52

@Natalia yes, misread it. However, the final answer ("no") is the same. I've edited my answer above to explain why. – Thorough 18/3, 2016 at 9:55

I think this is correct (according to the wording of the standard, that is) ;there doesn't seem to be any text saying that the common-initial-sequence rule recursively applies if the first member of each struct is itself an aggregate – Hesler 18/3, 2016 at 10:1

@Birthwort I believe that for the arrays to be layout compatible, there would need to be some part of the standard text which says that they are (and there is not). Technically this means, as you suggest, that it cannot be determined whether they are layout compatible. The safest assumption, then, is that they are not. (ok, you removed your comment...) – Thorough 18/3, 2016 at 10:14

Yeah, maybe you are right Basicly it boils down to layout-compatibility between int arr[2] and int arr[4]. And padding is not allowed before first data member. So what means initial subsequence. – Birthwort 18/3, 2016 at 10:18

if T standard-layout class (or fundamental type such say int - are int is standard-layout class ?) - and struct T2 { T t;} and struct T3 : T {} - are T, T2 and T3 layout-compatible classes ? note that Standard-layout classes are useful for communicating with code written in other programming languages. - so they layout is well known on binary level – Henceforth 25/12, 2018 at 23:34

The C Standard would allow an implementation to vary the placement of an array object within a structure based upon the number of elements. Among other things, there may be some circumstances where it may be useful to word-align a byte array which would occupy exactly one word, but not to word-align arrays of other sizes. For example, on a system with 8-bit char and 32-bit words, processing a structure such as:

struct foo {
  char header;
  char dat[4];
};

in a manner that word-aligns dat may allow an access to dat[i] to be processed by loading a word and shifting it right by a 0, 8, 16, or 24 bits, but such advantages might not be applicable had the structure instead been:

struct foo {
  char header;
  char dat[5];
};

The Standard was clearly not intended to forbid implementations from laying out structures in such ways, on platforms where doing so would be useful. On the other hand, when the Standard was written, compilers which would place arrays within a structure at offsets that were unaffected by the arrays' sizes would unanimously behave as though array elements that were present in two structures were part of the same Common Initial Sequence, and nothing in the published Rationale for the Standard suggests any intention to discourage such implementations from continuing to behave in such fashion. Code which relied upon such treatment would have been "non-portable", but correct on all implementations which followed common struct layout practices.

Translocate answered 26/8, 2022 at 18:44 Comment(0)

Recommended topics

Hot tags