Make interchangeable class types via pointer casting only, without having to allocate any new objects?

Asked 27/6, 2012 at 3:51 Answered 7/7, 2012 at 19:54

Solved c++c++11 language-lawyer reinterpret-cast type-punning

UPDATE: I do appreciate "don't want that, want this instead" suggestions. They are useful, especially when provided in context of the motivating scenario. Still...regardless of goodness/badness, I've become curious to find a hard-and-fast "yes that can be done legally in C++11" vs "no it is not possible to do something like that".

I want to "alias" an object pointer as another type, for the sole purpose of adding some helper methods. The alias cannot add data members to the underlying class (in fact, the more I can prevent that from happening the better!) All aliases are equally applicable to any object of this type...it's just helpful if the type system can hint which alias is likely the most appropriate.

There should be no information about any specific alias that is ever encoded in the underlying object. Hence, I feel like you should be able to "cheat" the type system and just let it be an annotation...checked at compile time, but ultimately irrelevant to the runtime casting. Something along these lines:

Node<AccessorFoo>* fooPtr = Node<AccessorFoo>::createViaFactory();
Node<AccessorBar>* barPtr = reinterpret_cast< Node<AccessorBar>* >(fooPtr);

Under the hood, the factory method is actually making a NodeBase class, and then using a similar reinterpret_cast to return it as a Node<AccessorFoo>*.

The easy way to avoid this is to make these lightweight classes that wrap nodes and are passed around by value. Thus you don't need casting, just Accessor classes that take the node handle to wrap in their constructor:

AccessorFoo foo (NodeBase::createViaFactory());
AccessorBar bar (foo.getNode());

But if I don't have to pay for all that, I don't want to. That would involve--for instance--making a special accessor type for each sort of wrapped pointer (AccessorFooShared, AccessorFooUnique, AccessorFooWeak, etc.) Having these typed pointers being aliased for one single pointer-based object identity is preferable, and provides a nice orthogonality.

So back to that original question:

Node<AccessorFoo>* fooPtr = Node<AccessorFoo>::createViaFactory();
Node<AccessorBar>* barPtr = reinterpret_cast< Node<AccessorBar>* >(fooPtr);

Seems like there would be some way to do this that might be ugly but not "break the rules". According to ISO14882:2011(e) 5.2.10-7:

An object pointer can be explicitly converted to an object pointer of a different type.70 When a prvalue v of type "pointer to T1" is converted to the type "pointer to cv T2", the result is static_cast(static_cast(v)) if both T1 and T2 are standard-layout types (3.9) and the alignment requirements of T2 are no stricter than those of T1, or if either type is void. Converting a prvalue of type "pointer to T1" to the type "pointer to T2" (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. The result of any other such pointer conversion is unspecified.

Drilling into the definition of a "standard-layout class", we find:

has no non-static data members of type non-standard-layout-class (or array of such types) or reference, and
has no virtual functions (10.3) and no virtual base classes (10.1), and
has the same access control (clause 11) for all non-static data members, and
has no non-standard-layout base classes, and
either has no non-static data member in the most-derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and
has no base classes of the same type as the first non-static data member.

Sounds like working with something like this would tie my hands a bit with no virtual methods in the accessors or the node. Yet C++11 apparently has std::is_standard_layout to keep things checked.

Can this be done safely? Appears to work in gcc-4.7, but I'd like to be sure I'm not invoking undefined behavior.

Guarded answered 27/6, 2012 at 3:51 Comment(4)

The initial syntax should be Node<AccessorFoo>& foo = *new Node<AccessorFoo>;. I still don't understand the question though :) – Toddy 27/6, 2012 at 3:58

@Toddy Oops, typo, fixed...thanks! In principle it's probably quite similar to this question, I just have more control over the types involved and can rewrite everything from scratch. So I'm wondering what options that might open up for me in this specific scenario. – Guarded 27/6, 2012 at 4:2

"I want to "alias" an object pointer as another type, for the sole purpose of adding some helper methods." Why? What do member functions give you that free functions cannot? Unless I'm missing something obvious (very possible), this sounds like a bit of misguided OOP zealotry... – Crinose 13/7, 2012 at 1:42

@Crinose Maybe? I've lamented that SO is often a damned-if-you-do-explain ("What's with all the specifics, what's your actual question?") vs. damned-if-you-don't-explain ("Why would you need that?") environment. :( I have a final class that has NC const operations and N non-const operations, lives in a memory mapped file. Sometimes I'd like to augment and "type" the interface to this class as a hint to help people out. The DOM is the best analogy I've got, sadly – Guarded 13/7, 2012 at 2:8

The term Accessor is a dead giveaway: what you are looking for is a Proxy.

There is no reason for a Proxy not to be passed around by value.

// Let us imagine that NodeBase is now called Node, since there is no inheritance

class AccessorFoo {
public:
    AccessorFoo(Node& n): node(n) {}

    int bar() const { return node->bar; }

private:
    std::reference_wrapper<Node> node;
};

And then you can freely convert from one accessor to another... though this smells. Normally the very goal of having an accessor is to restrict access in a way, so casting nilly willy to another accessor is bad. However one could support casting to a narrower accessor.

Hydrometer answered 27/6, 2012 at 6:40 Comment(14)

+1 Thanks..although from pages on that site it would be more like a Facade, though in practice I might want a Decorator. "Adapter provides a different interface to its subject. Proxy provides the same interface. Decorator provides an enhanced interface (...) Facade defines a new interface, whereas Adapter reuses an old interface." I'm willing to have the object-relational-model be a compromise to match up to my desires (any choices will be) but just don't want actively illegal C++. Hence question. – Guarded 27/6, 2012 at 7:5

@HostileFork: The names are always problematic. A Facade is generally more an aggregation of interfaces (for example libclang is a C-Facade over the Clang libraries). The idea that a Proxy provides the same interface is quite restrictive. From Wikipedia, a Proxy provides a placeholder for another object to control access, reduce cost, and reduce complexity. Your question looks a lot like control access, in which case it is natural to present a restricted interface. – Hydrometer 27/6, 2012 at 8:20

You say "There is no reason for a Proxy not to be passed around by value" which runs counter to my notion that there could be a reason. Namely: if you want an effective "is-a" relationship then if a pointer to the Proxy "were-a" pointer to the node as well, then you can use unique_ptr (shared_ptr, weak_ptr, etc.) with the Proxy in the same way you might use it with a direct Node pointer. This works if you're using C++ inheritance to implement the pattern, but if your object is a realization of an external database item it doesn't. I'm asking if it's possible to finesse this gap in legal C++. – Guarded 27/6, 2012 at 16:7

@HostileFork: But the Proxy is not a Node, since it is specifically meant to restrict the available interface. Isn't it ? – Hydrometer 27/6, 2012 at 17:31

Sorry that I've not really pinned down whether it's intended to "restrict" or "augment" the Node interface. The bigger point is that given a client's understanding that it can always get back a generic node, any "restriction" would only be a "hey are you sure you want to do this" step, encouraging you to check the accessor interface to see if a higher-order operation would be more suited. If "restriction" turned out to actually be the point, I'd be doing the actual casting inside a wrapper method, to mimic the "only derived classes know about the is-a" relationship of protected inheritance. – Guarded 27/6, 2012 at 17:42

@HostileFork: In that case, you still don't need inheritance. A simple explicit operator Node& () { return *node; } (and its const counterpart) would be sufficient. – Hydrometer 27/6, 2012 at 18:9

Yes...if one "contains" a node pointer, then it is certainly possible to extract the contained node via a cast operator, while not inheriting from node. (?) Maybe I'm misunderstanding your point, but I don't see it applying to this perhaps Quixotic and/or self-contradicting goal to try and get those "is-a" ("is-really-only-a"?) benefits with regard to a new interface for an object passed by pointer. Wrapping it up into a new value class--even if it has cast operations--won't hit that particular property. – Guarded 27/6, 2012 at 19:15

It would appear that even with standard layout types, this is prohibited by strict aliasing requirements (see @PaulGroke's response). In which case, the only thing the standard is allowing you to do is to temporarily hold the pointer as another type and then convert it back, but not use it in the meantime? (Perhaps more specifically, not call any non-const methods?) – Guarded 7/7, 2012 at 11:25

I appreciate the time you've taken in your responses...and your point raised about how treading into dark areas of the specification is probably more trouble than it is worth in practice. Still, in the interests of formal understanding, I think it can be worth it to ferret out the answers to these kinds of questions, to the extent they exist. The point of the language-lawyer tag is to pursue these understandings, whether the motivations are "reasonable" or not...! – Guarded 10/7, 2012 at 0:27

I've taken your advice (I hope/think?) and tinkered through the implementation of a Proxy system that hopefully does not introduce any undefined behavior issues, yet gives the client behavior I want. You can make the Accessor inherit protected or public. It has a template which inherits from its own partial template specialization in order to implement the mutable variant as a derived version of the const variant, which finesses a couple of things. It's relatively short and weird, but here's the gist (pun intended) – Guarded 14/7, 2012 at 0:37

You're probably tired of this question (and I don't blame you), but I'm resolving answers for all my un-accepted questions. And if you have anything to say about the related codereview: here it is (currently bountied). – Guarded 3/5, 2014 at 2:45

Is there any reason to use a reference_wrapper (a pointer that is itself wrapped with sugar-coating accessors) here when an actual reference member would work just fine? Seems like unnecessary complication to me, unless I'm missing something. – Atomics 8/4, 2016 at 7:44

@underscore_d: making assignment work. You cannot assign a new reference to an existing reference (aka, reseat the reference), it just assigns the value referred to. – Hydrometer 8/4, 2016 at 16:18

Ah, course, that's what I was missing. I forgot about that benefit of reference_wrapper, since I've never needed to use it and tend to forget the finer details. Thanks! – Atomics 8/4, 2016 at 17:32

I believe the strict aliasing rules forbid what you are trying to do.

To clarify: strict aliasing has nothing to do with layout compatibility, POD types or what not. It has to do with optimization. And with what the language explicitly forbids you to do.

This paper sums it up rather well: http://dbp-consulting.com/StrictAliasing.pdf

Kielty answered 2/7, 2012 at 23:15 Comment(6)

+1 as I found some useful tags ("type-punning", for instance) and jumping off points from strict aliasing. But most of the strict aliasing links I found seem to talk about types with different size or layout, and here it's only methods that are different w/no virtual dispatch as all are constructed as the same type. It seems that it's even okay to cast from signed to unsigned variations of the same type in strict aliasing...so I would think that this question of method dispatch would depend on something like this "standard layout" definition (if anything), or maybe some other umbrella rule? – Guarded 3/7, 2012 at 10:12

Interesting, thanks for the update w/the PDF, I read over that and see what you're talking about. Though now I wonder...if all the class methods were const, so the data members were never modified during the lifetime...would that be legal? (And if not, why not?) – Guarded 6/7, 2012 at 3:57

@HostileFork: It would not because the compiler need not guarantee that while you are executing a const method on the object it does not change under your feet through its other type. – Hydrometer 7/7, 2012 at 12:3

@MatthieuM. In light of the answer by jthill, is there such a thing as a "safety cast" to the fundamental type before each access which can sidestep the aliasing problems, if adhered to strictly? – Guarded 8/7, 2012 at 22:47

@HostileFork: Honestly ? I don't know. And I don't even see the point. The C++ Standard is extremely intricate and hard to navigate and I don't have the time, at the moment, to try and check all the seemingly broken answers your question generated. So I will only say this: the closer you get to the edges of the specifications, the more likely you are to hit a compiler bug. So, as engineer, I advise to be pragmactic and avoid playing mind games with your compiler; whether you are right or wrong is irrelevant: when you get a crash in production code, you lose, not the compiler writers. – Hydrometer 9/7, 2012 at 6:37

I'm going to award this answer the bounty, as it brought to light this "strict aliasing" I did not already know about...which does seem fundamentally important to worry about as a reason for why this could be technically prohibited. There still seems to be a gray area as to whether a "safety cast" which invokes mention of the base type is enough to workaround this (see jthill's answer) although this does of course start to get too brittle to apply in most reasonable circumstances. – Guarded 10/7, 2012 at 0:19

If I understand you correctly, you have:

A NodeBase class that is stateful, and the true workhorse of the system;
a set of stateless Accessor types that provide an interface to NodeBase; and
a Node<AccessorT> class which wraps an accessor, presumably providing convenience functions.

I assume the last bit because if you don't have a wrapper type that does convenience stuff, then there's no reason not to make the Accessor types your top-level, like you suggested: pass AccessorFoo and AccessorBar around by value. The fact that they aren't the same object is entirely moot; if you think of them like the pointers that they are, then you'll note that &foo != &bar is no more interesting than having NodeBase* p1 = new NodeBase; NodeBase* p2 = p1; and noting that, of course, &p1 != &p2.

If you really do need a wrapper Node<AccessorT> and want to make it standard-layout, then I would suggest that you use the statelessness of your Accessor types to your advantage. If they are simply a stateless container of functionality (which they must be; why else would you be able to freely cast them?), then you can do something like this:

struct AccessorFoo {
    int getValue(NodeBase* n) { return n->getValueFoo(); }
};

struct AccessorBar {
    int getValue(NodeBase* n) { return n->getValueBar(); }
};

template <typename AccessorT>
class Node {
    NodeBase* m_node;

public:
    int getValue() {
        AccessorT accessor;
        return accessor.getValue(m_node);
    }
};

In this case, you could add a templated conversion operator:

template <typename OtherT>
operator Node<OtherT>() {
    return Node<OtherT>(m_node);
}

And now you've got direct value conversion between any Node<AccessorT> type you like.

If you take it just a bit further, you'll make all the methods of the Accessor types static, and arrive at the traits pattern.

The section of the C++ standard that you quoted, incidentally, concerns the behavior of reinterpret_cast<T*>(p) in the case that both the source type and the final type are pointers to standard-layout objects, in which case the standard guarantees that you get the same pointer you'd get from casting to a void* and then to the final type. You still don't get to use the object as any type other than the type it was created as without invoking undefined behavior.

Flex answered 27/6, 2012 at 4:56 Comment(5)

+1 Yup you got the gist, and accessor methods should be static...though that's going to be tough on the Accessor authors compared to being able to call NodeBase methods when they were inheriting from it. :-/ I'm explicitly wishing for these Node<X> types to not be separate from the "workhorse" NodeBase instance because I use unique_ptr<Node<X>> to follow the hot-potato of ownership for Nodes. To keep using that I'd have to add yet another level of heap-allocation-tracking on these Node<X> handles instead of pass by value. So the standard layout trick is what I'm most curious about. – Guarded 27/6, 2012 at 5:41

@HostileFork Why not invert that, and put the unique_ptr inside of Node<X>? If you need other kinds of smart pointers, it's easy enough to make different variants of Node<X>, especially if it's a small wrapper around the accessor. Also, looking carefully, I don't think the standard-layout trick is what you think it is. The quoted section is about reinterpret_cast sharing semantics with static_cast in the case of standard-layout types. You still don't get to change types without invoking undefined behavior. – Flex 27/6, 2012 at 5:56

I didn't pick up on that bit...hm...in actuality the nodes are created with a factory method as NodeBase and then cast down in the return...would that be legitimate here or does it change nothing? As for making lightweight classes like UniqueNode<X> which are different from Node<X>...sounds possible, lots of things are possible! I'm just trying to hammer this into a clean separation of concerns that doesn't waste its time with unnecessary combinatorial explosions of classes and is as efficient as possible. (I'd probably use another language if I didn't find that motivating.) :) – Guarded 27/6, 2012 at 6:14

After running around on this question for a while, here's what I believe I'm going to go with: sample library for pattern and test program on Gist. "Foo" in this case is the Node, and "Wrapper" is what I'll probably wind up calling NodeRef or somesuch. It still seems a bit of a runaround in order to achieve the effect vs. the "type-punning"--but if it eliminates undefined behavior and the client code looks ok, I guess it's fine... ! – Guarded 13/7, 2012 at 20:56

The term Accessor is a dead giveaway: what you are looking for is a Proxy.

There is no reason for a Proxy not to be passed around by value.

// Let us imagine that NodeBase is now called Node, since there is no inheritance

class AccessorFoo {
public:
    AccessorFoo(Node& n): node(n) {}

    int bar() const { return node->bar; }

private:
    std::reference_wrapper<Node> node;
};

Hydrometer answered 27/6, 2012 at 6:40 Comment(14)

@HostileFork: But the Proxy is not a Node, since it is specifically meant to restrict the available interface. Isn't it ? – Hydrometer 27/6, 2012 at 17:31

struct myPOD {
   int data1;
   // ...
};

struct myPOD_extended1 : myPOD {
   int helper() { (*(myPOD*)this)->data1 = 6; };  // type myPOD referenced here
};
struct myPOD_extended2 : myPOD { 
   int helper() { data1 = 7; };                   // no syntactic ref to myPOD
};
struct myPOD_extended3 : myPOD {
   int helper() { (*(myPOD*)this)->data1 = 8; };  // type myPOD referenced here
};
void doit(myPOD *it)
{
    ((myPOD_extended1*)it)->helper();
    // ((myPOD_extended2*)it)->helper(); // uncomment -> program/compile undefined
    ((myPOD_extended3*)it)->helper();
}

int main(int,char**)
{
    myPOD a; a.data1=5; 
    doit(&a);

    std::cout<< a.data1 << '\n';
    return 0;
}

I believe this is guaranteed to work in all conforming C++ compilers and must print 8. Uncomment the marked line and all bets are off.

An optimizer might prune searches for valid aliased references by checking the list of syntactic types actually referenced in a function against the list (in 3.10p10) of syntactic types of references it's required to produce correct results for -- and when an actual ("dynamic") object type is known, that list doesn't include access through a reference to any derived type. So the explicit (and valid) downcasts of this to myPOD* in the helper()s puts myPOD on the list of types syntactically referenced there, and the optimizer has to treat the resulting references as potential legal references to (other names for, aliases of) the object a.

Pitzer answered 7/7, 2012 at 19:54 Comment(9)

The aliasing arguments others have made seem to suggest that the compiler is free to assume that aliases as other types than the one by which a class was allocated (myPOD in this case) do not exist. Hence data1 could be changed in one of the aliases and then that change not reflected in another. @MatthieuM has suggested even constness doesn't protect you as one of the aliases may be non-const. Can you cite a reason why you think this would be an exception to the rule? – Guarded 8/7, 2012 at 6:6

You can convert a structA* to a(n even completely unrelated) structB* and back again, and the compiler has to allow for that possibility: converting a pointer and passing it along doesn't mean that converted value can never be used to modify the original object. – Pitzer 8/7, 2012 at 11:45

It seems what you're saying runs against the strict aliasing paper, which suggests that you can do this only when you're casting between the actual type with which something was allocated (such as if you had declared something as myPOD_extended1, and then you cast it to myPOD and back). This is the central argument of the question I'm asking and it really does sound like strict aliasing permits the compiler to make optimization assumptions which would undermine this in other cases. :-/ – Guarded 8/7, 2012 at 17:56

Once the compiler sees doit(&a) it's not permitted to assume a will not be accessed through that pointer or any copy. The upcasts in doit are explicitly permitted, see 5.4p4, and the helper() functions receive a copy of that upcast pointer (as this), so the compiler can't assume they don't reference the myPOD through that copy either. I've added safety casts to the helper functions; I think a compiler would have to go out of its way to mishandle this case but I see it wasn't strictly correct before (which in this discussion means 'was wrong'). I believe it's strictly correct now. – Pitzer 8/7, 2012 at 21:26

Ok...so if you're saying that doing the cast back to the base type before any modifications prevents "aliasing", then what I want to do would be legal...? Because my derived types never add any members. I guess the trick is to make sure that all data member accesses go through what you are calling a "safety cast" (reads AND writes) instead of trying to access directly w/o a cast, which would be aliasing and therefore illegal? – Guarded 8/7, 2012 at 22:40

tl;dr: yes. Long: I can (now) see that an optimizer would prune searches for valid aliased references by checking the list of syntactic types actually referenced in a function against the list (in 3.10p10) of syntactic types that it's required to produce correct results for -- and when an actual ("dynamic") object type is known, that list doesn't include access through a reference to any derived type. Fair enough. So explicitly downcasting this puts myPOD on the list of types referenced in the helper()s, so if the optimizer can't prove this != &a it must treat them as (valid) aliases – Pitzer 8/7, 2012 at 23:50

This answer looks wrong to me. a is not a myPOD_extended1 (or 2 or 3) so the idea of casting to pointer to this type looks wrong; and I don't see how the paragraphs you cite would somehow justify this. – Hydrometer 9/7, 2012 at 6:22

If you read 5.4p4 you'll see that the first applicable conversion is the reinterpret_cast. 5.2.10p7 is the applicable paragraph there, and includes "A pointer to an object can be explicitly converted to a pointer to a different object type.[...] Converting a prvalue of type “pointer to T1” to the type “pointer to T2” [...] and back to its original type yields the original pointer value." The upcast pointer is never dereferenced. All casts are going to "look wrong": they're there to work around syntactical limitations in the language. Things like this are exactly what 5.2p10 is for. – Pitzer 9/7, 2012 at 14:6

Interesting, I'm going to have to look at this in detail and dig up the references. I wonder if you adopt my initial reading that the category of classes this would be legal to do would be "standard layout" types, and not just POD? (Also: terminology note...the closer you go to the base class it's "upcasting". Seems kind of backwards as you'd think the term "base" would mean "bottom-most", but no... :-/) – Guarded 9/7, 2012 at 21:13

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags