Force NumPy ndarray to take ownership of its memory in Cython
Asked Answered
F

3

15

Following this answer to "Can I force a numpy ndarray to take ownership of its memory?" I attempted to use the Python C API function PyArray_ENABLEFLAGS through Cython's NumPy wrapper and found it is not exposed.

The following attempt to expose it manually (this is just a minimum example reproducing the failure)

from libc.stdlib cimport malloc
import numpy as np
cimport numpy as np

np.import_array()

ctypedef np.int32_t DTYPE_t

cdef extern from "numpy/ndarraytypes.h":
    void PyArray_ENABLEFLAGS(np.PyArrayObject *arr, int flags)

def test():
    cdef int N = 1000

    cdef DTYPE_t *data = <DTYPE_t *>malloc(N * sizeof(DTYPE_t))
    cdef np.ndarray[DTYPE_t, ndim=1] arr = np.PyArray_SimpleNewFromData(1, &N, np.NPY_INT32, data)
    PyArray_ENABLEFLAGS(arr, np.NPY_ARRAY_OWNDATA)

fails with a compile error:

Error compiling Cython file:
------------------------------------------------------------
...
def test():
    cdef int N = 1000

    cdef DTYPE_t *data = <DTYPE_t *>malloc(N * sizeof(DTYPE_t))
    cdef np.ndarray[DTYPE_t, ndim=1] arr = np.PyArray_SimpleNewFromData(1, &N, np.NPY_INT32, data)
    PyArray_ENABLEFLAGS(arr, np.NPY_ARRAY_OWNDATA)
                          ^
------------------------------------------------------------

/tmp/test.pyx:19:27: Cannot convert Python object to 'PyArrayObject *'

My question: Is this the right approach to take in this case? If so, what am I doing wrong? If not, how do I force NumPy to take ownership in Cython, without going down to a C extension module?

Finite answered 26/5, 2014 at 15:0 Comment(2)
Did my answer work for you?Boyette
It did indeed, thanks!Finite
B
20

You just have some minor errors in the interface definition. The following worked for me:

from libc.stdlib cimport malloc
import numpy as np
cimport numpy as np

np.import_array()

ctypedef np.int32_t DTYPE_t

cdef extern from "numpy/arrayobject.h":
    void PyArray_ENABLEFLAGS(np.ndarray arr, int flags)

cdef data_to_numpy_array_with_spec(void * ptr, np.npy_intp N, int t):
    cdef np.ndarray[DTYPE_t, ndim=1] arr = np.PyArray_SimpleNewFromData(1, &N, t, ptr)
    PyArray_ENABLEFLAGS(arr, np.NPY_OWNDATA)
    return arr

def test():
    N = 1000

    cdef DTYPE_t *data = <DTYPE_t *>malloc(N * sizeof(DTYPE_t))
    arr = data_to_numpy_array_with_spec(data, N, np.NPY_INT32)
    return arr

This is my setup.py file:

from distutils.core import setup, Extension
from Cython.Distutils import build_ext
ext_modules = [Extension("_owndata", ["owndata.pyx"])]
setup(cmdclass={'build_ext': build_ext}, ext_modules=ext_modules)

Build with python setup.py build_ext --inplace. Then verify that the data is actually owned:

import _owndata
arr = _owndata.test()
print arr.flags

Among others, you should see OWNDATA : True.

And yes, this is definitely the right way to deal with this, since numpy.pxd does exactly the same thing to export all the other functions to Cython.

Boyette answered 26/5, 2014 at 15:40 Comment(6)
This doesn't work for me. It compiles fine, but importing the module results in a linking error, complaining about PyArray_ENABLEFLAGS. This is with numpy 1.9.1.Reputable
This solution works since numpy 1.7. Older versions lack PyArray_ENABLEFLAGSUmbra
This fails with ImportError: ./_owndata.so: undefined symbol: PyArray_ENABLEFLAGSBowler
@Korem Have you use cdef extern from "numpy/arrayobject.h" ? And maybe you need to check if "numpy" is in your include path.Neurovascular
what if the data type is user defined like my_dtype = np.dtype([('t1', np.float32), ('t2', np.uint16)])? I realize it has a type_num but could not figure out how to get it in Cython. It is defined cdef np.ndarray[mytype_t] arr and mytype_t has packed float and uint16.Sherlynsherm
In setup.py, Extension should have the include_dirs=[numpy.get_include()] parameter in my case for it to find the numpy headers.Bipartite
A
7

@Stefan's solution works for most scenarios, but is somewhat fragile. Numpy uses PyDataMem_NEW/PyDataMem_FREE for memory-management and it is an implementation detail, that these calls are mapped to the usual malloc/free + some memory tracing (I don't know which effect Stefan's solution has on the memory tracing, at least it seems not to crash).

There are also more esoteric cases possible, in which free from numpy-library doesn't use the same memory-allocator as malloc in the cython code (linked against different run-times for example as in this github-issue or this SO-post).

The right tool to pass/manage the ownership of the data is PyArray_SetBaseObject.

First we need a python-object, which is responsible for freeing the memory. I'm using a self-made cdef-class here (mostly because of logging/demostration), but there are obviously other possiblities as well:

%%cython
from libc.stdlib cimport free

cdef class MemoryNanny:
    cdef void* ptr # set to NULL by "constructor"
    def __dealloc__(self):
        print("freeing ptr=", <unsigned long long>(self.ptr)) #just for debugging
        free(self.ptr)
        
    @staticmethod
    cdef create(void* ptr):
        cdef MemoryNanny result = MemoryNanny()
        result.ptr = ptr
        print("nanny for ptr=", <unsigned long long>(result.ptr)) #just for debugging
        return result

 ...

Now, we use a MemoryNanny-object as sentinel for the memory, which gets freed as soon as the parent-numpy-array gets destroyed. The code is a little bit awkward, because PyArray_SetBaseObject steals the reference, which is not handled by Cython automatically:

%%cython
...
from cpython.object cimport PyObject
from cpython.ref cimport Py_INCREF

cimport numpy as np

#needed to initialize PyArray_API in order to be able to use it
np.import_array()


cdef extern from "numpy/arrayobject.h":
    # a little bit awkward: the reference to obj will be stolen
    # using PyObject*  to signal that Cython cannot handle it automatically
    int PyArray_SetBaseObject(np.ndarray arr, PyObject *obj) except -1 # -1 means there was an error
          
cdef array_from_ptr(void * ptr, np.npy_intp N, int np_type):
    cdef np.ndarray arr = np.PyArray_SimpleNewFromData(1, &N, np_type, ptr)
    nanny = MemoryNanny.create(ptr)
    Py_INCREF(nanny) # a reference will get stolen, so prepare nanny
    PyArray_SetBaseObject(arr, <PyObject*>nanny) 
    return arr
...

And here is an example, how this functionality can be called:

%%cython
...
from libc.stdlib cimport malloc
def create():
    cdef double *ptr=<double*>malloc(sizeof(double)*8);
    ptr[0]=42.0
    return array_from_ptr(ptr, 8, np.NPY_FLOAT64)

which can be used as follows:

>>> m =  create()
nanny for ptr= 94339864945184
>>> m.flags
...
OWNDATA : False
...
>>> m[0]
42.0
>>> del m
freeing ptr= 94339864945184

with results/output as expected.

Note: the resulting arrays doesn't really own the data (i.e. flags return OWNDATA : False), because the memory is owned be the memory-nanny, but the result is the same: the memory gets freed as soon as array is deleted (because nobody holds a reference to the nanny anymore).


MemoryNanny doesn't have to guard a raw C-pointer. It can be anything else, for example also a std::vector:

%%cython -+
from libcpp.vector cimport vector
cdef class VectorNanny:
    #automatically default initialized/destructed by Cython:
    cdef vector[double] vec 
    @staticmethod
    cdef create(vector[double]& vec):
        cdef VectorNanny result = VectorNanny()
        result.vec.swap(vec) # swap and not copy
        return result
   
# for testing:
def create_vector(int N):
    cdef vector[double] vec;
    vec.resize(N, 2.0)
    return VectorNanny.create(vec)

The following test shows, that the nanny works:

nanny=create_vector(10**8) # top shows additional 800MB memory are used
del nanny                  # top shows, this additional memory is no longer used.
Abreu answered 2/5, 2019 at 20:35 Comment(0)
B
6

The latest Cython version allows you to do with with minimal syntax, albeit slightly more overhead than the lower-level solutions suggested.

numpy_array = np.asarray(<np.int32_t[:10, :10]> my_pointer)

https://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html#coercion-to-numpy

This alone does not pass ownership.

Notably, a Cython array is generated with this call, via array_cwrapper.

This generates a cython.array, without allocating memory. The cython.array uses the stdlib.h malloc and free by default, so it would be expected that you use the default malloc, as well, instead of any special CPython/Numpy allocators.

free is only called if ownership is set for this cython.array, which it is by default only if it allocates data. For our case, we can manually set it via:

my_cyarr.free_data = True


So to return a 1D array, it would be as simple as:

from cython.view cimport array as cvarray

# ...
    cdef cvarray cvarr = <np.int32_t[:N]> data
    cvarr.free_data = True
    return np.asarray(cvarr)
Branle answered 25/3, 2020 at 19:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.