C++: converting a container to a container of different yet compatible type
Asked Answered
Z

8

10

It often happens to me to have a a container C (or whatever kind of wrapper class, even smart pointers) for a type T1, and want to convert such C<T1> into C<T2>, where T2 is compatible to T1.

C++ doesn't allow me to directly convert the whole container, and forcing a reinterpet_cast would result in undefined behavior, so I would need to create a new C<T2> container and repopulate it with C<T1> items casted as T2. This operation could be quite expensive, both in time and space.

Moreover for many cases I'm pretty sure that forcing a reinterpret_cast would work fine with the code compiled by any compiler ever existed, for example when T2 is T1 const, or when T1 and T2 are pointers.

Is there any clean and efficient way to convert a C<T1> in a C<T2>?
For example a container_cast operator(/function?) that creates and repopulates a C<T2> if and only if it wouldn't be binary compatible to C<T1>?

Zamboanga answered 5/1, 2011 at 15:14 Comment(2)
Most standard containers don't support const types as they aren't assignable. For pointer types, why not use the the most general type that you need to store in the container? In general when you cast a T1 to a T2 the result is a different object so a conversion from a container of T1 to a container of T2 implies copying the contained elements. You can't avoid this expense.Teague
Something fundamentally flawed in this design that the need to cast containers would arise. Containers, instantiated on different types are never guaranteed to be compatible for cast, at the same time the types they contain may or may not be compatible. If they are compatible, cast the objects container holds, not the container itself.Chinatown
Y
5

Besides all the other issues dealt by others:

  • conversion does not imply same memory footprint (think conversion operations...)
  • potential specializations of the template class (container in your question, but from the point of view of the compiler a container is just another template) even if the types are themselves binary compatible
  • unrelated-ness of different instantiations of the same template (for the general case)

There is a basic problem in the approach that is not technical at all. Provided that an apple is a fruit, neither a container of fruits is a container of apples (trivially demonstrated) nor a container of apples is a container of fruit. Try to fit a watermelon in a box of apples!

Going to more technical details, and dealing specifically with inheritance where no conversion is even required, (a derived object is already an object of the base class), if you were allowed to cast a container of the derived type to the base type, then you could add invalid elements to the container:

class fruit {};
class apple : public fruit {};
class watermelon : public fruit {};
std::vector<apple*> apples = buy_box_of_apples();
std::vector<fruit*> & fruits = reinterpret_cast< std::vector<fruit*>& >(apples);
fruits.push_back( new watermelon() ); // ouch!!!

The last line is perfectly correct: you can add a watermelon to a vector<fruit*>. But the net effect is that you have added a watermelon to a vector<apple*>, and in doing so you have broken the type system.

Not everything that looks simple in a first look is in fact sane. This is similar to the reason why you cannot convert an int ** to a const int ** even if the first thought is that it should be allowed. The fact is that allowing so would break the language (in this case const correctness):

const int a = 5;
int *p = 0;
int **p1 = &p;       // perfectly fine
const int **p2 = p1; // should this be allowed??
*p2 = &a;            // correct, p2 points to a pointer to a const int
**p1 = 100;          // a == 100!!!

Which brings us back to the example you provided in one of the comments to another answer (to prove the point in general, I'll use a vector and instead of a set since set contents are immutable):

std::vector<int*> v1;
std::vector<const int*> &v2 = v1; // should this be allowed?
const int a = 5;
v2.push_back( &a );  // fine, v2 is a vector of pointers to constant int
                     // rather not: it IS a vector of pointers to non-const ints!
*v1[0] = 10;         // ouch!!! a==10
Yongyoni answered 5/1, 2011 at 20:31 Comment(1)
Well, thanks, your answer is the one that best tells me why I shouldn't from a logical point of view, even if it would work in practice. We'll keep in mind your examples, they could answer to many doubts sometimes spawn in my mind. My case is a little bit different (I need to give away the object: will destroy it when the function I give it to returns - probably it's such function to be bad designed, don't know). Now I know why I should never go for a similar solution in other cases.Zamboanga
S
4

Moreover for many cases I'm pretty sure that forcing a reinterpret_cast would work fine

I’m betting you that it doesn’t. Two containers that store different types are never guaranteed to be binary compatible even if their contained objects are. Even if they happen to be binary compatible under some specific version of some compiler implementation, this is an implementation detail that can change from one minor version to the next.

Relying on such undocumented behaviour is opening the door to many unpleasantly long nights of debugging.

If you want to pass such containers to a function, simply make the function a template so that containers of arbitrary type can be passed into it. Similar with classes. This is the whole point of templates, after all.

Souffle answered 5/1, 2011 at 15:24 Comment(3)
Of course that's not guaranteed, but in practice any compiler will store a std::set<int const*> and a std::set<int*> in the same way, I bet you to find any real compiler or STL implementation that won't do this. I also think it shouldn't be hard for a compiler to offer a non-standard extension (macro) that says whether it guarantees two types are binary compatible.Zamboanga
@peoro: the problem is twofold: (1) these containers may store runtime type information; granted, this is unlikely in release mode (since it incurs an overhead) but altogether possible when debugging. (2) there may exist specializations of the containers even for binary compatible types that have a different memory layout.Souffle
@peoro: I have added an answer that tries to explain that the problem is that allowing these type of conversions would break the language in worse ways than what it may help. There is a concrete example of why even if std::vector<int*> and std::vector<const int*> are binary compatible, allowing the conversion would break const-correctness in the language.Glister
E
4

Why not use the safe way

C<T1> c1;
/* Fill c1 */
C<T2> c2(c1.begin(), c1.end());

and then profile. If it turns out to be a bottleneck then you can always revisit your underlying algorithm and perhaps remove the need for a conversion completely.

Relying on any particular behavior from reinterpret_cast may not cause problems now but months or years from now it will almost certainly cause someone debugging problems.

Ec answered 5/1, 2011 at 16:32 Comment(0)
Z
3

Ok, let me summarize the whole thing.

Your (correct!) answers say that in C++ binary compatibility * is never guaranteed for different types. It's undefined behavior to take the value of a memory area where a variable is located, and use it for a variable of a different type (and this most likely should be avoided also with variables of the same type).

Also in real-life this thing could be dangerous even for simple objects, never mind containers!

*: by binary compatibility I mean that the same values is stored in memory in the same way and that the same assembly instruction are used at the same way to manuipulate it. eg: even if float and int are 4 bytes each, they are not be binary compatible.


However I'm not satisfied by this C++ rule: let's focus on a single case, like on these two structures: struct A{ int a[1000000]; }; and struct B{ int a[1000000]; };.

We can't just use the address of an A object as if it was a B one. And this frustrates me for the following reasons:

  • The compiler statically knows if those structures are binary compatible: once the executable has been generated you could look at it and tell if they are such. Just it (the compiler) doesn't give us these information.

  • As far as I know any C++ compiler ever existed treats data in a consistent way. I can't even imagine of a compiler generating different representations for those two structures. The point that bugs me the most is that not only those simple A and B structs are binary compatible, but about any container is, if you use it with types you can expect to be binary compatible (I ran some tests with GCC 4.5 and Clang 2.8 on both custom containers and STL/boost ones).

  • Casting operators allow the compiler do what I'm looking to do, but only with basic types. If you cast an int as const int (or an int* and a char*), and those two types are binary compatible, the compiler can (most likely will) avoid making a copy of it and just use the same raw bytes.


My idea is then to create a custom object_static_cast that will check if the object of the type it got, and the object of the type to cast into are binary compatible; if they are it just returns the casted reference, otherwise it'll construct a new object and will return it.

Hope to not be downvoted too much for this answer; I'll delete it if SO community doesn't like it.

To check if two types are binary compatible introduced a new type trait:

// NOTE: this function cannot be safely implemented without compiler
//       explicit support. It's dangerous, don't trust it.
template< typename T1, typename T2 >
struct is_binary_compatible : public boost::false_type{};

as the note sais (and as said earlier) there's no way to actually implement such type trait (just like boost::has_virtual_destructor, for example).

Then here is the actual object_static_cast implementation:

namespace detail
{
    template< typename T1, typename T2, bool >
    struct object_static_cast_class {
        typedef T1 ret;
        static ret cast( const T2 &in ) {
            return T1( in );
        }
    };

    // NOTE: this is a dangerous hack.
    //       you MUST be sure that T1 and T2 is binary compatible.
    //       `binary compatible` means 
    //       plus RTTI could give some issues
    //       test this any time you compile.
    template< typename T1, typename T2 >
    struct object_static_cast_class< T1, T2, true > {
        typedef T1& ret;
        static ret cast( const T2 &in ) {
            return *( (T1*)& in ); // sorry for this :(
        }
    };

}

// casts @in (of type T2) in an object of type T1.
// could return the value by value or by reference
template< typename T1, typename T2 >
inline typename detail::object_static_cast_class< T1, T2,
        is_binary_compatible<T1, T2>::value >::ret
    object_static_cast( const T2 &in )
{
    return detail::object_static_cast_class< T1, T2,
            is_binary_compatible<T1, T2>::value >::cast( in );
};

And here an usage example

struct Data {
    enum { size = 1024*1024*100 };
    char *x;

    Data( ) {
        std::cout << "Allocating Data" << std::endl;
        x = new char[size];
    }
    Data( const Data &other ) {
        std::cout << "Copying Data [copy ctor]" << std::endl;
        x = new char[size];
        std::copy( other.x, other.x+size, x );
    }
    Data & operator= ( const Data &other ) {
        std::cout << "Copying Data [=]" << std::endl;
        x = new char[size];
        std::copy( other.x, other.x+size, x );
        return *this;
    }
    ~Data( ) {
        std::cout << "Destroying Data" << std::endl;
        delete[] x;
    }
    bool operator==( const Data &other ) const {
        return std::equal( x, x+size, other.x );
    }

};
struct A {
    Data x;
};
struct B {
    Data x;

    B( const A &a ) { x = a.x; }
    bool operator==( const A &a ) const { return x == a.x; }
};

#include <cassert>
int main( ) {
    A a;
    const B &b = object_static_cast< B, A >( a );

    // NOTE: this is NOT enough to check binary compatibility!
    assert( b == a );

    return 0;
}

Output:

$ time ./bnicmop 
Allocating Data
Allocating Data
Copying Data [=]
Destroying Data
Destroying Data

real    0m0.411s
user    0m0.303s
sys     0m0.163s

Let's add these (dangerous!) lines before main():

// WARNING! DANGEROUS! DON'T TRY THIS AT HOME!
// NOTE: using these, program will have undefined behavior: although it may
//       work now, it might not work when changing compiler.
template<> struct is_binary_compatible< A, B > : public boost::true_type{};
template<> struct is_binary_compatible< B, A > : public boost::true_type{};

Output becomes:

$ time ./bnicmop 
Allocating Data
Destroying Data

real    0m0.123s
user    0m0.087s
sys     0m0.017s

This should only be used in critical points (not to copy an array of 3 elements once in a while!), and to use this stuff we need at least write some (heavy!) test units for all the types we declared binary compatible, in order to check if they still are when we upgrade our compilers.

Besides to be on the safer side, the undefined-behaving object_static_cast should only be enabled when a macro is set, so that it's possible to test the application both with and without it.


About my project, I I'll be using this stuff in a point: I need to cast a big container into a different one (which is likely to be binary compatible with my one) in my main loop.

Zamboanga answered 5/1, 2011 at 21:6 Comment(0)
C
2

The reason you can't cast the containers has nothing to do with the types themselves. The problem is that you're trying to cast two objects that are, as far as the compiler and linker are concerned, two unrelated classes.

When you do C<int> and C<short>, for example, the compiler emits code like this:

class C_int_ {
    //...
};

class C_short_ {
    //...
};

Since these classes are obviously unrelated, you can't cast them. And if you force it (eg, using a C cast), and it has any virtual functions, you will likely blow something up.

Instead, you have to do it manually, using a loop. Sorry.

Contra answered 5/1, 2011 at 15:20 Comment(8)
I know that reinterpret_cast is unsafe for this reason. Did you read the third and fourth paragraph on the question? Some times types are binary compatible, I'm looking for a way to let the compiler reinterpret_cast in the case they are.Zamboanga
Yes, I did. But, what I'm saying is that even if the bytes happen to be identical, they're not binary compatible. And, nothing you can do will convince the compiler otherwise.Contra
Of course a C<int> is going to be different from a C<float>. Wouldn't even bother myself if my case was like that. I was more explicitly thinking about C<int const*> and C<int*>. No compiler on heard would generate different code (unless C has got a specialization for constant pointers)...Zamboanga
The thing to keep in mind is that the compiler is literally generating classes like I did in the answer. At one point (long ago, when the C++ compiler targeted C), it did it in the preprocessor. So, it's not that the compiler is playing dumb, it's that they're literally as different as class A { int a; }; and class B { int a;};Contra
Yes, still I don't see why I shouldn't force a cast of A in B if I know that they (the way they're represented in memory once compiled) are the same. I mean if your classes were like these ones: class A { int a[1000000]; }; and class B { int a[1000000]; }; you would have a big speed up just directly casting A in B. Since my objects can reach big dimensions I'd like to write a code that statically checks if types are compatible, and if they are, it casts: if( statically_compatible<A,B>::value ) { force_cast<B>(A); } else { slow_copy<B>(A); }...Zamboanga
@peoro: the problem is that nothing guarantees you that they are really binary compatible. You can make guesses, you can even perform some static assertions (the base type of the two container is POD, is of the same size, ...), but I don't think there's any way to be 100% sure.Dull
@Matteo Italia: ok, I got it. Anyway I feel very frustrated about this. The compiler knows if two objects are binary compatible; from the moment it generates the executable I can say if things would work flawlessly. All I'm missing is a construct able to tell me at compile time if two objects are binary compatible. I'll be using a container_cast<> that creates a copy of the container. If it'll slow things down (which I think may be likely, since it's in my main loop) I'll specialize container_cast to force a memory cast, adding test units for it, and NOTEs everywhere about its risks.Zamboanga
@peoro, @Mateo Italia: The problem is deeper than that. Even in cases where the two might be binary compatible, allowing the conversion would open the language to other types of inconsistencies.Glister
L
1

This is generally difficult. The problem becomes apparent when considering template specialization, for instance the infamous vector<bool>, which has an implementation that differs from a vector<int> in much more than just the argument type.

Lantz answered 5/1, 2011 at 15:26 Comment(0)
C
1

It's absolutely not guaranteed that those containers are binary compatible and could be casted with something like reinterpret_cast<>.

For example, if the container (like std::vector) stores the data internally in a C-style array, C<T1> would contain a T1[] array while C<T2> would contain a T2[]. If now T1 and T2 have different sizes (for example T2 has more member variables) the memory of the T1[] can not simply be interpreted as a T2[] since the elements of these arrays would be located at different positions.

So simply interpreting the C<T1> memory as a C<T2> won't work and a real conversion is necessary.

(Furthermore there might be template specializations for different types, so that C<T1> might look completely different than C<T2>)

For converting one container to another see for example this question or many other related ones.

Calyx answered 5/1, 2011 at 15:26 Comment(1)
Ok, what I am asking for is not going to be possible if T1 and T2 are binary incompatible. I was thinking about cases like C<const int*> and C<int*>, where C hos got no specialization about its containee constness: unless for really weird compilers reinterpret_cast is going to work fine.Zamboanga
D
0

This is indeed difficult for containers. Type compatibility is not enough, types actually need to be identical in memory to prevent slicing when assigning. It might be possible to implement a ptr_container that exposes pointers of a compatible type. For example, boost's ptr_containers keep void*s internally anyways, so casting them to compatible pointers should work.

That said, this is definitely possible with smart pointers. For example, boost::shared_ptr implements static_pointer_cast and dynamic_pointer_cast.

Dinkins answered 5/1, 2011 at 16:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.