Can I force a numpy ndarray to take ownership of its memory?

Asked 3/1, 2012 at 6:36 Answered 19/8, 2013 at 11:38

I have a C function that mallocs() and populates a 2D array of floats. It "returns" that address and the size of the array. The signature is

int get_array_c(float** addr, int* nrows, int* ncols);

I want to call it from Python, so I use ctypes.

import ctypes
mylib = ctypes.cdll.LoadLibrary('mylib.so')
get_array_c = mylib.get_array_c

I never figured out how to specify argument types with ctypes. I tend to just write a python wrapper for each C function I'm using, and make sure I get the types right in the wrapper. The array of floats is a matrix in column-major order, and I'd like to get it as a numpy.ndarray. But its pretty big, so I want to use the memory allocated by the C function, not copy it. (I just found this PyBuffer_FromMemory stuff in this StackOverflow answer: https://mcmap.net/q/21546/-getting-data-from-ctypes-array-into-numpy)

buffer_from_memory = ctypes.pythonapi.PyBuffer_FromMemory
buffer_from_memory.restype = ctypes.py_object

import numpy
def get_array_py():
    nrows = ctypes.c_int()
    ncols = ctypes.c_int()
    addr_ptr = ctypes.POINTER(ctypes.c_float)()
    get_array_c(ctypes.byref(addr_ptr), ctypes.byref(nrows), ctypes.byref(ncols))
    buf = buffer_from_memory(addr_ptr, 4 * nrows * ncols)
    return numpy.ndarray((nrows, ncols), dtype=numpy.float32, order='F',
                         buffer=buf)

This seems to give me an array with the right values. But I'm pretty sure it's a memory leak.

>>> a = get_array_py()
>>> a.flags.owndata
False

The array doesn't own the memory. Fair enough; by default, when the array is created from a buffer, it shouldn't. But in this case it should. When the numpy array is deleted, I'd really like python to free the buffer memory for me. It seems like if I could force owndata to True, that should do it, but owndata isn't settable.

Unsatisfactory solutions:

Make the caller of get_array_py() responsible for freeing the memory. That's super annoying; the caller should be able to treat this numpy array just like any other numpy array.
Copy the original array into a new numpy array (with its own, separate memory) in get_array_py, delete the first array, and free the memory inside get_array_py(). Return the copy instead of the original array. This is annoying because it's an ought-to-be unnecessary memory copy.

Is there a way to do what I want? I can't modify the C function itself, although I could add another C function to the library if that's helpful.

Cori answered 3/1, 2012 at 6:36 Comment(3)

This sounds like a world of pain.. I think you are asking for segfault hell – Tumult 3/1, 2012 at 7:27

I have tried this as well without success using ctypes. A full up extension module makes this possible but they are more work to write. – Kierstenkieselguhr 1/2, 2012 at 20:20

Related question here: #23931171 – Natalyanataniel 23/5 at 5:58

I just stumbled upon this question, which is still an issue in August 2013. Numpy is really picky about the OWNDATA flag: There is no way it can be modified on the Python level, so ctypes will most likely not be able to do this. On the numpy C-API level - and now we are talking about a completely different way of making Python extension modules - one has to explicitly set the flag with:

PyArray_ENABLEFLAGS(arr, NPY_ARRAY_OWNDATA);

On numpy < 1.7, one had to be even more explicit:

((PyArrayObject*)arr)->flags |= NPY_OWNDATA;

If one has any control over the underlying C function/library, the best solution is to pass it an empty numpy array of the appropriate size from Python to store the result in. The basic principle is that memory allocation should always be done on the highest level possible, in this case on the level of the Python interpreter.

As kynan commented below, if you use Cython, you have to expose the function PyArray_ENABLEFLAGS manually, see this post Force NumPy ndarray to take ownership of its memory in Cython.

The relevant documentation is here and here.

Isometry answered 19/8, 2013 at 11:38 Comment(8)

How would I achieve the same in Cython? Unfortunately PyArray_ENABLEFLAGS seems not to be exposed in numpy.pxd. – Slouch 7/5, 2014 at 13:18

If the required functionality is not exposed to Cython, you could either patch Cython or edit the C file that it generates manually. – Isometry 7/5, 2014 at 13:26

Neither of those seem very sustainable options to me. I tried extending what is exposed by numpy.pxd in my pyx file but had no luck with that. – Slouch 7/5, 2014 at 16:44

Could you post a separate question? I will try to answer properly. – Isometry 7/5, 2014 at 17:11

Done: Force NumPy ndarray to take ownership of its memory in Cython – Slouch 26/5, 2014 at 15:1

@Stefan: In numpy < 1.7, what is the difference between PyArray_UpdateFlags(arr, NPY_OWNDATA) and arr->flags |= NPY_OWNDATA? The latter works, the former does not, but it is not clear from the doc why. – Kiyokokiyoshi 15/7, 2014 at 12:57

@user443854: It appears PyArray_UpdateFlags() does not let you update all flags, in particular not NPY_OWNDATA (It is simply ignored if you try; see flagsobject.c in the numpy sources). The new PyArray_ENABLEFLAGS() is simply an inline function of the above bit-setting expression. – Isometry 15/7, 2014 at 13:12

@stefan: as of 2019 it doesn't seem to be the case: PyArray_ENABLEFLAGS(PyArrayObject *arr, int flags) { ((PyArrayObject_fields *)arr)->flags |= flags; } – Jdavie 20/12, 2019 at 7:35

I would tend to have two functions exported from my C library:

int get_array_c_nomalloc(float* addr, int nrows, int ncols); /* Pass addr as argument */
int get_array_c(float **addr, int nrows, int ncols); /* Calls function above */

I would then write my Python wrapper[1] of get_array_c to allocate the array, then call get_array_c_nomalloc. Then Python does own the memory. You could integrate this wrapper into your library so your user never has to be aware of get_array_c_nomalloc's existence.

[1] This isn't really a wrapper anymore, but instead is an adapter.

Lavellelaven answered 3/1, 2012 at 7:28 Comment(2)

Sorry, I had the signature to get_array_c() wrong! It takes in int pointers for nrows and ncols -- I don't know how big the array will be, so I can't preallocate the array in python. – Cori 3/1, 2012 at 8:44

Well, you could alternatively make your python wrapper use an object to hold the reference/access the memory, and use a finalizer to free the array... Don't know if that violates your aesthetic or not, but the user won't have to explicitly free the memory. – Lavellelaven 3/1, 2012 at 15:32

Recommended topics

Hot tags