Zero-cost properties with data member syntax
Asked Answered
K

2

18

I have (re?)invented this approach to zero-cost properties with data member syntax. By this I mean that the user can write:

some_struct.some_member = var;
var = some_struct.some_member;

and these member accesses redirect to member functions with zero overhead.

While initial tests show that the approach does work in practice, I'm far from sure that it is free from undefined behaviour. Here's the simplified code that illustrates the approach:

template <class Owner, class Type, Type& (Owner::*accessor)()>
struct property {
    operator Type&() {
        Owner* optr = reinterpret_cast<Owner*>(this);
        return (optr->*accessor)();
    }
    Type& operator= (const Type& t) {
        Owner* optr = reinterpret_cast<Owner*>(this);
        return (optr->*accessor)() = t;
    }
};

union Point
{
    int& get_x() { return xy[0]; }
    int& get_y() { return xy[1]; }
    std::array<int, 2> xy;
    property<Point, int, &Point::get_x> x;
    property<Point, int, &Point::get_y> y;
};

The test driver demonstrates that the approach works and it is indeed zero-cost (properties occupy no additional memory):

int main()
{
    Point m;
    m.x = 42;
    m.y = -1;

    std::cout << m.xy[0] << " " << m.xy[1] << "\n";
    std::cout << sizeof(m) << " " << sizeof(m.x) << "\n";
}

Real code is a bit more complicated but the gist of the approach is here. It is based on using a union of real data (xy in this example) and empty property objects. (Real data must be a standard layout class for this to work).

The union is needed because otherwise properties needlessly occupy memory, despite being empty.

Why do I think there's no UB here? The standard permits accessing the common initial sequence of standard-layout union members. Here, the common initial sequence is empty. Data members of x and y are not accessed at all, as there are no data members. My reading of the standard indicate that this is allowed. reinterpret_cast should be OK because we are casting a union member to its containing union, and these are pointer-interconvertible.

Is this indeed allowed by the standard, or I'm missing some UB here?

Kingly answered 10/2, 2019 at 13:55 Comment(14)
I think there is no UB, at least not with c++11 and later. However I would not make Point a union, but only place the data member(s) and the corresponding properties into an anonymous union inside Point. Then use reinterpret_cast in the properties to cast to the data member (not to the class Point). This way you can inherit from Point and the approach probably scales better since you (or child classes) can place more than one anonymous union inside the class.Steiger
@AndreasH. I'm doing exactly what you suggest in real code, however it makes things more complicated. I have simplified it for presentation purposes.Kingly
Doesn't pointer-interconvertibility imply an object to be alive to change the pointer value to point to it? Or this is only required by std::launder?Larch
The only potential for UB I can think of is [class.mfct.non-static]/2. The object is inactive when it's member function is called.Miscreated
@LanguageLawyer No, you can acquire pointers to inactive objects of the same union.Miscreated
@PasserBy but it's still an object of the correct type, although inactive.Kingly
@LanguageLawyer the standard says "A union object and its non-static data members are pointer-interconvertible", although only in a note. It doesn't say "A union object and its active member..." In general one needs a pointer to a member in order to make that member active, so it should be possible to obtain a pointer to an inactive member.Kingly
@n.m. «there is an object b ... that is pointer-interconvertible with a» in [expr.static.cast]/13 makes me wonder, can we say that an object «is» when it is not alive. In general one needs a pointer to a member in order to make that member active But one doesn't need pointer-interconvertibility to get such pointer.Larch
@LanguageLawyer "A union object and its non-static data members are pointer-interconvertible" is more than enough for me. If you think this statement doesn't really guarantee interconvertibility for all members, as opposed to only the active member, you are welcome to file a defect report.Kingly
@n.m. what if I don't think this is a defect?Larch
@LanguageLawyer Don't submit a report then.Kingly
Fwiw, I've been here with my own Really Clever Design (R) (TM) that also exploited unions, and after entrenching it in my program, discovered that it was UB for the same reason. That was a fun rewrite... (I mean, in totality, it was, because I ended up with code that was better and more flexible for other reasons - but I didn't like being rushed stressfully into it!)Bakerman
"A union object and its non-static data members are pointer-interconvertible" is more than enough for me. If you think this statement doesn't really guarantee interconvertibility for all members, as opposed to only the active member M-m-m-kay. If pointer-interconvertibility doesn't care about activity of members, which member subobject I'm interconvertible with in union U { char a; char b; } u {}; reinterpret_cast<char*>(&u);?Larch
@LanguageLawyer I don't know, this looks like a defect in the standard to me.Kingly
M
14

TL;DR This is UB.

[basic.life]

Similarly, before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any glvalue that refers to the original object may be used but only in limited ways. For an object under construction or destruction, see [class.cdtor]. Otherwise, such a glvalue refers to allocated storage, and using the properties of the glvalue that do not depend on its value is well-defined. The program has undefined behavior if: [...]

  • the glvalue is used to call a non-static member function of the object, or

By definition, an inactive member of an union isn't within its lifetime.


A possible workaround is to use C++20 [[no_unique_address]]

struct Point
{
    int& get_x() { return xy[0]; }
    int& get_y() { return xy[1]; }
    [[no_unique_address]] property<Point, int, &Point::get_x> x;
    [[no_unique_address]] property<Point, int, &Point::get_y> y;
    std::array<int, 2> xy;
};

static_assert(offsetof(Point, x) == 0 && offsetof(Point, y) == 0);
Miscreated answered 10/2, 2019 at 15:40 Comment(14)
oh, so the permission to examine members of inactive objects does not extend to member fumctions. This is unfortunate and looks like a defect to me.Kingly
@n.m. I'm surprised as well, didn't think [basic.life] would outright ban such usage. Particularly so since calling through a null pointer is arguably well-defined.Miscreated
@n.m.: Why is that a defect? It makes sense; the common initial sequence rule is about reading a value created through a different union member. It's not about allowing you to use unions in whatever way you want. Unions are supposed to have only one active member; talking to an inactive member is supposed to be wrong. The common initial sequence rule just specifies a specific case where it's OK to read a piece of data written through the active member.Floatable
@NicolBolas Any access to a data member should be possiblebto encapsulate in a member function. One is allowed to access x.y but not x.y_() which in turn only accesses x.y. This doesn't look right. What the rationale for disallowing x.y_() would be?Kingly
@n.m.: Because it doesn't make sense. You're allowed to access x.y because the compiler can clearly see that you're accessing a specific member variable. The scope of your action is bounded, and it is clear to all what the state of things is. Calling a member function could do anything (as evidenced by this very example, where you reach out into some other object to get the reference). The scope of the action is unbounded. And personally, I would say that allowing it makes a mockery of the object model.Floatable
It looks like [[no_unique_address]] will solve the problem indeed. One can use offsetof to calculate the reverse offset too, instead of asserting it's zero.Kingly
@n.m.: The annoying part of the no_unique_address solution is that you would naturally want to make the actual members private while leaving the "properties" public, but doing so breaks standard layout. And if you break standard layout, there is a much better chance that the layout of the type will be disturbed by the presence of no_unique_address members (not to mention breaking offsetof. Which is why I think that "attribute" should have been a keyword with actual behavior behind it, not merely a suggestion.Floatable
@NicolBolas calling a member function passing the offending pointer as this isn't very much different from calling a non-member function passing that same pointer as any old argument. The latter is however allowed, while the former is not.Kingly
@n.m.: You're confusing mechanism with intent. Calling a member function is mechanically similar to calling a non-member function with the same this pointer. But the intent behind these things is altogether different. If you call a member function of an object, that means something, something which is fundamentally different from passing any old parameter to a non-member function. That meaning is why we bother to have member functions at all.Floatable
@NicolBolas offsetof is conditionally supported for non-standard-layout types since c++17. There's no reason why it wouldn't be supported in most implementations.Kingly
@NicolBolas um, no. This is entirely not true. If I call a member function, it's because I want it to perform a certain action, not because I want to make some kind of deep philosophical statement. I wouldn't bother having any non-virtual member functions if I could. Everything could have been expressed as friend functions instead. There's neither technical nor philosophical reason to allow certain kinds of operations as members only, but purely an aesthetical one. friend operator=(myclass&, const myclass&) isn't any worse than the standard form and confers the intent just as well. Oh well..Kingly
@n.m.: "If I call a member function, it's because I want it to perform a certain action, not because I want to make some kind of deep philosophical statement." But if you write a function as a member, you are making a "deep philosophical statement" about the relationship between that function and the object it is a member of. That you personally don't care about that "philosophical statement" doesn't mean it isn't there. This is part of why unified function call syntax is non-workable.Floatable
@NicolBolas no, I don't. Please read what I wrote again. I write a member function only when the language leaves me no choice. I don't see any non-cosmetic difference between member and non-member functions. If you do, fine, but don't try to impose your view on me. Or at least try to explain first how you decide to implement, say, operator+ as a member or as a non-member. Why does the language allow both choices anyway?Kingly
Let us continue this discussion in chat.Floatable
F
6

Here is what the common-initial-sequence rule says about unions:

In a standard-layout union with an active member of struct type T1, it is permitted to read a non-static data member m of another union member of struct type T2 provided m is part of the common initial sequence of T1 and T2; the behavior is as if the corresponding member of T1 were nominated.

Your code does not qualify. Why? Because you are not reading from "another union member". You are doing m.x = 42;. That isn't reading; that's calling a member function of another union member.

So it doesn't qualify for the common initial sequence rule. And without the common-initial-sequence rule to protect you, accessing non-active members of the union is UB.

Floatable answered 10/2, 2019 at 15:54 Comment(1)
unfortunately calling a (non-virtual) member function is indeed an action that is separately disallowed for objects out of their lifetime; as I said it probably shouldn't be, but there's little we can do.Kingly

© 2022 - 2024 — McMap. All rights reserved.