Can we access a member of a non-existing union?

Asked 5/11, 2018 at 8:51 Answered 6/11, 2018 at 19:15

c++language-lawyer unions strict-aliasing class-members

In the c++ standard, in [basic.lval]/11.6 says:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:[...]

an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),[...]

This sentence is part of the strict-aliasing rule.

Can it allow us to access the inactive member of a non existing union? As in:

struct A{
  int id :1;
  int value :32;
  };
struct Id{
  int id :1;
  };

union X{
  A a;
  Id id_;
  };

void test(){
  A a;
  auto id = reinterpret_cast<X&>(a).id_; //UB or not?
  }

Note: Bellow an explanation of what I do not grasp in the standard, and why the example above could be useful.

I wonder in what could [basic.lval]/11.6 be usefull.

[class.mfct.non-static]/2 does forbid us to call a member function of the "casted to" union or aggregate:

If a non-static member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.

Considering that static data member access, or static member function can directly be performed using a qualified-name (a_class::a_static_member), the only usefull uses case of the [basic.lval]/11.6, may be to access member of the "casted to" union. I thought about using this last standard rule to implement an "optimized variant". This variant could hold either a class A object or a class B object, the two starting with a bitfield of size 1, denoting the type:

class A{
  unsigned type_id_ :1;
  int value :31;
  public:
  A():type_id_{0}{}
  void bar{};
  void baz{};
  };

class B{
  unsigned type_id_ :1;
  int value :31;
  public:
  B():type_id_{1}{}
  int value() const;
  void value(int);
  void bar{};
  void baz{};
  };

struct type_id_t{
  unsigned type_id_ :1;
  };

struct AB_variant{
  union {
    A a;
    B b;
    type_id_t id;};
    //[...]
  static void foo(AB_variant& x){
    if (x.id.type_id_==0){
      reinterpret_cast<A&>(x).bar();
      reinterpret_cast<A&>(x).baz();
      }
    else if (x.id.type_id_==1){
      reinterpret_cast<B&>(x).bar();
      reinterpret_cast<B&>(x).baz();
      }
    }
 };

The call to AB_variant::foo does not invoke undefined behavior as long as its argument refers to an object of type AB_variant thanks to the rule of pointer-interconvertibility [basic.compound]/4. The access to the inactive union member type_id_ is allowed because id belongs to the common initial sequence of A, B and type_id_t [class.mem]/25:

But what happens if I try to call it with a complete object of type A?

A a{};
AB_variant::foo(reinterpret_cast<AB_variant&>(a));

The problem here is that I try to access an inactive member of a union that does not exist.

The two pertinent standard paragraphs are [class.mem]/25:

In a standard-layout union with an active member of struct type T1, it is permitted to read a non-static data member m of another union member of struct type T2 provided m is part of the common initial sequence of T1 and T2; the behavior is as if the corresponding member of T1 were nominated.

And [class.union]/1:

In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended.

Q3: Does the expression "its name refers" signify that "an object" is actually an object built within a living union? Or could it refers to object a because of [basic.lval]/11.6.

Bobbie answered 5/11, 2018 at 8:51 Comment(12)

Q1, Q2... Q3... Isn't that hitting the very definition of SO's "too broad"? – Caryophyllaceous 5/11, 2018 at 8:54

@StoryTeller I'd argue that they are just the question's structure. – Bailable 5/11, 2018 at 8:57

Unfortunately this is probably more of a discussion question than a question which might have a specific answer, so I tend to agree: “too broad”. – Wavelet 5/11, 2018 at 8:59

The first example seems to be a clear strict aliasing violation because you are dereferencing a pointer to A that is actually a pointer to int. (actually it looks like a typo and supposed to be reinterpret_cast<A*>(&j) and does not even compile as is) – Fomentation 5/11, 2018 at 9:1

@Bailable - If by structure you mean 3 distinct questions. You honestly tell me that Q1 and Q2 are incapable of standing on their own? – Caryophyllaceous 5/11, 2018 at 9:2

@StoryTeller The question in the title is a direct question, which could admit two answer: Yes or No. But I want also explanation. So I put questions that show where I feel I don't understand the standard. If you answer those questions I will get an explanation to the answer to the main question in the title. – Bobbie 5/11, 2018 at 9:3

Then please do limit your question to what is directly pertinent to the title. Everything else is just bloat. – Caryophyllaceous 5/11, 2018 at 9:4

Ok I put bloat in a Note. – Bobbie 5/11, 2018 at 9:5

How about just asking separate questions? You are far more likely to get every point addressed properly this way. – Caryophyllaceous 5/11, 2018 at 9:7

@StoryTeller I made a new question and reduced this one. Should I continue to split? – Bobbie 5/11, 2018 at 9:30

@KamilCuk, I have just corrected these copy/paste errors, thanks. The second example is some form of implementation of virtual function by hand. Is it strict alias violation? That is indeed the question. Strict aliasing rule violation happens when we access the value of an object with the wrong type with exceptions defined in [basic.lval]/11.6. This exemple code fall on this exception, the problem is weither or not a class member access where the object expression as the wrong type is UB? It is clearly specified literally for member function call, but not for non static data member access. – Bobbie 5/11, 2018 at 9:56

@StoryTeller Q1 was hopelessly broad if it were to be taken separately IMO. But this is not the hill I want to die on ;) – Bailable 5/11, 2018 at 12:26

[expr.ref]/4.2 defines what E1.E2 means if E2 is a non-static data member:

If E2 is a non-static data member [...], the expression designates the named member of the object designated by the first expression.

This defines behavior only for the case where the first expression actually designates an object. Since in your example the first expression designates no object, the behavior is undefined by omission; see [defns.undefined] ("Undefined behavior may be expected when this document omits any explicit definition of behavior...").

You are also misinterpreting what "access" means in the strict aliasing rule. It means "read or modify the value of an object" ([defns.access]). A class member access expression naming a non-static data member neither reads nor modifies the value of any object and therefore is not an "access", and therefore there's never an "access ... through" a glvalue of "an aggregate or union type" by reason of a class member access expression.

[basic.lval]/11.6 is essentially copied from C, where it actually meant something because assigning or copying a struct or union accesses the object as a whole. It's meaningless in C++ because assignment and copying of class types are performed through special member functions that either performs memberwise copying (and so "accesses" the members individually) or operates on the object representation. See core issue 2051.

Pompei answered 6/11, 2018 at 19:15 Comment(6)

This question may have lost consistency because I had to split it. Actualy the first part is here. You will see I mention what is an access. My problem are expressions like "designate" or "name refers to" etc... Shall I consider a name as an identity of an object (the identity of an entity)? In this case the "name" designates and can only designates the member suboject and never an object that would happen to be at the same location? – Bobbie 6/11, 2018 at 19:39

So this core issue is also the answer of an other question, what "in what [basic.lval]/11.6 could be usefull? => answer nothing in c++? – Bobbie 6/11, 2018 at 19:43

I would argue that the intended meaning of the "aliasing rules" in C, and the derivative rules in C++, was that an access made via member-access lvalue, or a pointer that is freshly derived from it, should be treated (for purposes of "aliasing rules") as though it were an access via the parent lvalue. An access to an lvalue that is freshly derived from a union lvalue would thus be an access via that union lvalue, which would in turn be allowed to access all other union members. The "parent lvalue can access children" rule is thus needed to allow such access. – Spar 6/11, 2018 at 20:4

@Spar So this rule intended to be as asymmetric as is the "is member of" rule. What I do in my example code is using it in the not intended direction, no? – Bobbie 6/11, 2018 at 21:23

@Spar Code similar to the OP's has been brought up on the CWG reflector, and this answer is consistent with the answer given there, so I believe it accurately reflects the current interpretation of the standard wording by C++ compiler implementers. – Pompei 6/11, 2018 at 21:36

@T.C.: The Standards deliberately grant implementations intended for specialized purposes broad permission to behave in ways that would make them unsuitable for most other purposes. The behavior of gcc/clang isn't really an interpretation of the Standard, but rather a decision to process a dialect which is unsuitable for most of the purposes C was invented to serve, and whose guiding philosophy is contrary to the Spirit of C described in the Rationale for the C Standard. – Spar 6/11, 2018 at 23:55

There are many situations, especially involving type punning and unions, where one part of the C or C++ Standard describes the behavior of some action, another part describes an overlapping class of actions as invoking UB, and the area of overlap includes some actions which should be processed consistently by all implementations as well as others that would be impractical to support on at least some implementations. Rather than trying to fully describe all cases that should be treated as defined, the authors of the Standard expected that implementations would seek to uphold the Spirit of C described in the Rationale, including the principle "Don't prevent the programmer from doing what needs to be done". This would generally lead to quality implementations giving priority to the definition of behavior when necessary to meet their customer's needs, while giving priority to the "undefinition" of behavior when that would allow optimizations that also serve their customer's needs.

The only way to treat the C or C++ Standard as defining a useful language is to recognize a category of actions whose behavior is described by one part of the Standard and classified as UB by another, and recognizing the treatment of actions in that category as a Quality of Implementation issue outside the jurisdiction of the Standard. The authors of the Standard expected compiler writers to be sensitive to their customers' needs, and thus didn't see conflicts between behavioral definitions and undefinitions as a particular problem. They thus saw no need to define terms like "object", "lvalue", "lifetime", and "access" in ways that could be applied consistently without creating such conflicts, and the definitions they created are thus not usable for purposes of deciding whether or not particular actions should be defined when such conflicts exist.

Consequently, unless or until the Standards recognize more concepts associated with objects and ways of accessing them, the question of whether a quality implementation intended to be suitable for some purpose should be expected to support a certain action will depend upon whether its authors should be expected to recognize that the action would be useful for such purpose.

Spar answered 6/11, 2018 at 18:55 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags