Do scalar members in a union count towards the common initial sequence?
Asked Answered
R

1

10

In the union U below, if a or b is the active member, is it defined behavior to access c?

struct A{
    int a;
};
struct B{
    int a;
    double b;
};
union U{
    A a;
    B b;
    int c;
};

In [class.union], the standard defines some rules to make using a union easier (emphasis mine):

[ Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence, and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members; see [class.mem].  — end note ]

I'm hung up on the word struct here. Does a standard layout scalar like int count even though it's not a struct?

  • My union U above is indeed a "standard-layout" union following [class] that basically says it needs to be a standard layout class that uses the union keyword instead, and since we only use scalars (standard layout types), it passes.

  • The structs obviously share a common initial sequence that consists of the first int member, but it's unclear if fundamental types can be considered for common initial sequences.

    • [class.union] also says that "Each non-static data member is allocated as if it were the sole member of a struct." which I think gives evidence that it is defined.
  • Finally, standard layout structs are not allowed to have padding at the beginning ([class.mem]), and the members of a union are pointer interconvertible so the standard is telling us that the int elements in the standard layout structs, and the non-static int c in the union are guaranteed to align.
Romanov answered 11/1, 2018 at 14:18 Comment(11)
A common initial sequence is only defined as a property of class types. I'm not going into the discussing of is the access valid, but if it is, it's definitely not according to the common initial sequence guarantee.Redevelop
@StoryTeller: I just updated the question with another excerpt from the standard about how non static data members are allocated as if they were the sole member of a struct , which kind of implies that an int in the context of a union can be considered a standard layout struct.Romanov
I'll tell you more. Those members are pointer-interconvertible. But I'm not sure if accessing the "inactive" member is okay on account of it. I'm gonna wager a guess that no.Redevelop
@StoryTeller: It's unclear to me if you mean only int c when you say the "inactive member" or if you are including B b (assuming A a is the active member)Romanov
I mean any member of the union that wasn't last written to. Colloquially referring to the opposite of what [class.union]/1 defines. So yeah, c and b (though in this particular case, c is of interest).Redevelop
@StoryTeller: I think it's at least defined to access the first member of b according to the first standard excerpt in my answer.Romanov
Yes, that is true. You are always okay in accessing the first member of b and a interchangeably. c doesn't affect that. I think I understand your question now better, as well. And Eric seems to have answered it mostly.Redevelop
The defined term is "standard-layout struct". Not "struct".Chalcedony
@Chalcedony Yes, but an int is also considered to be a standard layout type, which is why I bolded only "struct"Romanov
See also this questionStoned
@Barry: Thanks for that. This is frustrating, though! I feel like it should be an exception to the rule.Romanov
E
10

struct A and struct B are:

  • contained in the standard-layout union U,
  • standard-layout structs, and
  • share a common initial sequence.

So, they satisfy the description in the sentence “If a standard-layout union contains several standard-layout structs that share a common initial sequence…”.

The int c that is also in the union is not such a struct nor is it in such a struct. So this sentence is not telling you that you can write to c and inspect a.a or b.a, nor that you can write to a.a or b.a and inspect c.

This means that c is not part of the common initial sequence you can inspect. But neither does it spoil the common initial sequence of struct A and struct B.

Regarding the text “Each non-static data member is allocated as if it were the sole member of a struct," the standard is being a bit sloppy with language here. Allocation usually refers to acquiring or reserving storage, but this use seems to refer to laying out the bytes of an object within given storage. I do not see a formal definition in the C++ standard (but I did not look too hard), but I did find a similar use. So I take it to mean that each non-static data member is laid out as if it were the sole member.

What this says is that a pointer to any one of these union members points to the same place as a pointer to any of the other union members. This may imply that pointers to one can be converted to pointers to the other. However, it does not give you license to violate the strict-aliasing rules. Even if x is a pointer to c and y is a pointer to a or a.a, you cannot use *x to access c while a is the last-written member or use *y while c is the last-written member.

Estivation answered 11/1, 2018 at 14:28 Comment(1)
I think the comment regarding aliasing makes all the difference. Even though the memory is guaranteed to align, accessing the int member when one of the structs is active will violate aliasing.Romanov

© 2022 - 2024 — McMap. All rights reserved.