Initializing Cython objects with existing C Objects
Asked Answered
P

1

8

C++ Model

Say I have the following C++ data structures I wish to expose to Python.

#include <memory>
#include <vector>

struct mystruct
{
    int a, b, c, d, e, f, g, h, i, j, k, l, m;
};

typedef std::vector<std::shared_ptr<mystruct>> mystruct_list;

Boost Python

I can wrap these fairly effectively using boost::python with the following code, easily allowing me to use the existing mystruct (copying the shared_ptr) rather than recreating an existing object.

#include "mystruct.h"
#include <boost/python.hpp>

using namespace boost::python;


BOOST_PYTHON_MODULE(example)
{
    class_<mystruct, std::shared_ptr<mystruct>>("MyStruct", init<>())
        .def_readwrite("a", &mystruct::a);
        // add the rest of the member variables

    class_<mystruct_list>("MyStructList", init<>())
        .def("at", &mystruct_list::at, return_value_policy<copy_const_reference>());
        // add the rest of the member functions
}

Cython

In Cython, I have no idea how to extract an item from mystruct_list, without copying the underlying data. I have no idea how I could initialize MyStruct from the existing shared_ptr<mystruct>, without copying all the data over in one of various forms.

from libcpp.memory cimport shared_ptr
from cython.operator cimport dereference


cdef extern from "mystruct.h" nogil:
    cdef cppclass mystruct:
        int a, b, c, d, e, f, g, h, i, j, k, l, m

    ctypedef vector[v] mystruct_list


cdef class MyStruct:
    cdef shared_ptr[mystruct] ptr

    def __cinit__(MyStruct self):
        self.ptr.reset(new mystruct)

    property a:
        def __get__(MyStruct self):
            return dereference(self.ptr).a

        def __set__(MyStruct self, int value):
            dereference(self.ptr).a = value


cdef class MyStructList:
    cdef mystruct_list c
    cdef mystruct_list.iterator it

    def __cinit__(MyStructList self):
        pass

    def __getitem__(MyStructList self, int index):
        # How do return MyStruct without copying the underlying `mystruct` 
        pass

I see many possible workarounds, and none of them are very satisfactory:

I could initialize an empty MyStruct, and in Cython assign over the shared_ptr. However, this would result in wasting an initalized struct for absolutely no reason.

MyStruct value
value.ptr = self.c.at(index)
return value

I also could copy the data from the existing mystruct to the new mystruct. However, this suffers from similar bloat.

MyStruct value
dereference(value.ptr).a = dereference(self.c.at(index)).a
return value

I could also expose a init=True flag for each __cinit__ method, which would prevent reconstructing the object internally if the C-object exists already (when init is False). However, this could cause catastrophic issues, since it would be exposed to the Python API and would allow dereferencing a null or uninitialized pointer.

def __cinit__(MyStruct self, bint init=True):
    if init:
        self.ptr.reset(new mystruct)

I could also overload __init__ with the Python-exposed constructor (which would reset self.ptr), but this would have risky memory safety if __new__ was used from the Python layer.

Bottom-Line

I would love to use Cython, for compilation speed, syntactical sugar, and numerous other reasons, as opposed to the fairly clunky boost::python. I'm looking at pybind11 right now, and it may solve the compilation speed issues, but I would still prefer to use Cython.

Is there any way I can do such a simple task idiomatically in Cython? Thanks.

Pomiculture answered 21/6, 2017 at 21:30 Comment(9)
Does return dereference(self.c.at(index).get()) work? I.e. retrieve the shared_ptr from the vector, get() the stored pointer and dereference it. Or maybe simply return dereference(self.c.at(index)) (in C++ you can dereference the shared pointer directly).Ceramist
This however gives you a mystruct instead of a MyStruct. I guess you would need a second constructor def __cinit__(MyStruct self, new_ptr): self.ptr.reset(new_ptr) and then do return MyStruct(self.c.at(index)).Ceramist
Yeah, there's just a few issues unfortunately @HenriMenke. Cython won't let me use C types as arguments in a def (unlike a cdef), and initialization functions cannot be cdef-only. If Cython let me define custom constructors with cdef, that would solve everything. Unfortunately, it does not. It's probably doable via the Python C-API, or by overloading __init__, but the docs pretty clearly state the object should be valid when __init__ is called, and __init__ may not be called at all. cython.readthedocs.io/en/latest/src/userguide/…Pomiculture
Overloaded __cinit__ plus return MyStruct.__new__(self.c.at(index)) could work.Ceramist
»this would have risky memory safety if __new__ was used from the Python layer« You are raising your standards to an unreasonable and ridiculous level. If somebody calls __new__ on the Python level they better know what they are doing. If you want memory safety just rewrite your whole code in Python.Ceramist
@HenriMenke, A). That actually won't work, it only works because the existing object is a Python object: if it was a C struct, it would raise an error (tested). In fact, I get the exact error the question is about: Cannot convert 'type' to Python object. B). Expecting memory safety from choices made in a memory-safe language is not a trivial concern. It's essential.Pomiculture
@HenriMenke If I try the same from boost::python, I get ` Boost.Python.instance.__new__(): not enough arguments, which highlights how it prevents initialization without required data (in this case, it needs to know the type). If I force the type with c = a.MyStruct.__new__(MyStruct)`, and then try to use c, it automatically checks that the struct is invalid before I access any member functions. That is useful memory safety.Pomiculture
@HenriMenke Ok, last thing, it works with cdef struct, but not with cdef cppclass so maybe I should change the title? Either way, it does not work unless I do manual memory management, since shared_ptr is clearly a cppclass. Either way, this seems to be a major design flaw that I don't see an obvious solution to....Pomiculture
@HenriMenke After looking at the documentation closer, it appears there is a @nonecheck setting you can use to prevent null dereferences in Cython I believe. If you write that as an answer, along with the __init__ override, I can give you the answer.Pomiculture
C
7

The way this works in Cython is by having a factory class to create Python objects out of the shared pointer. This gives you access to the underlying C/C++ structure without copying.

Example Cython code:

<..>

cdef class MyStruct:
    cdef shared_ptr[mystruct] ptr

    def __cinit__(self):
        # Do not create new ref here, we will
        # pass one in from Cython code
        self.ptr = NULL

    def __dealloc__(self):
        # Do de-allocation here, important!
        if self.ptr is not NULL:
            <de-alloc>

    <rest per MyStruct code above>

cdef object PyStruct(shared_ptr[mystruct] MyStruct_ptr):
    """Python object factory class taking Cpp mystruct pointer
    as argument
    """
    # Create new MyStruct object. This does not create
    # new structure but does allocate a null pointer
    cdef MyStruct _mystruct = MyStruct()
    # Set pointer of cdef class to existing struct ptr
    _mystruct.ptr = MyStruct_ptr
    # Return the wrapped MyStruct object with MyStruct_ptr
    return _mystruct

def make_structure():
    """Function to create new Cpp mystruct and return
    python object representation of it
    """
    cdef MyStruct mypystruct = PyStruct(new mystruct)
    return mypystruct

Note the type for the argument of PyStruct is a pointer to the Cpp struct.

mypystruct then is a python object of class MyStruct, as returned by the factory class, which refers to the Cpp mystruct without copying. mypystruct can be safely returned in def cython functions and used in python space, per make_structure code.

To return a Python object of an existing Cpp mystruct pointer just wrap it with PyStruct like

return PyStruct(my_cpp_struct_ptr)

anywhere in your Cython code.

Obviously only def functions are visible there so the Cpp function calls would need to be wrapped as well inside MyStruct if they are to be used in Python space, at least if you want the Cpp function calls inside the Cython class to let go of the GiL (probably worth doing for obvious reasons).

For a real-world example see this Cython extension code and the underlying C code bindings in Cython. Also see this code for Python function wrapping of C function calls that let go of GIL. Not Cpp but same applies.

See also official Cython documentation on when a factory class/function is needed (Note that all constructor arguments will be passed as Python objects). For built in types, Cython does this conversion for you but for custom structures or objects a factory class/function is needed.

The Cpp structure initialisation could be handled in __new__ of PyStruct if needed, per suggestion above, if you want the factory class to actually create the C++ structure for you (depends on the use case really).

The benefit of a factory class with pointer arguments is it allows you to use existing pointers of C/C++ structures and wrap them in a Python extension class, rather than always having to create new ones. It would be perfectly safe to, for example, have multiple Python objects referring to the same underlying C struct. Python's ref counting ensures they won't be de-allocated prematurely. You should still check for null when deallocating though as the shared pointer could already had been de-allocated explicitly (eg, by del).

Note that there is, however, some overhead in creating new python objects even if they do point to the same C++ structure. Not a lot, but still.

IMO this auto de-allocation and ref counting of C/C++ pointers is one of the greatest features of Python's C extension API. As all that acts on Python objects (alone), the C/C++ structures need to be wrapped in a compatible Python object class definition.

Note - My experience is mostly in C, the above may need adjusting as I'm more familiar with regular C pointers than C++'s shared pointers.

Chihli answered 11/7, 2017 at 15:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.