Extend python with C, return numpy array gives garbage
Asked Answered
M

3

5

I am wrapping a C file so I can use it in python. The output of the C function is an array of doubles. I want this to be a numpy array in python. I get garbage. Here's an example that generates the error.

First, the C file (focus on the last function definition, everything else should be OK):

#include <Python.h>
#include <numpy/arrayobject.h>
#include <stdio.h>

static char module_docstring[] =
    "docstring";

static char error_docstring[] =
        "generate the error";

static PyObject *_aux_error(PyObject *self, PyObject *args);

static PyMethodDef module_methods[] = {
        {"error", _aux_error, METH_VARARGS, error_docstring},
        {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC init_tmp(void) {

    PyObject *m = Py_InitModule3("_tmp", module_methods, module_docstring);
    if (m == NULL)
        return;

    /* Load `numpy` functionality. */
    import_array();
}

static PyObject *_aux_error(PyObject *self ,PyObject *args) {

    double vector[2] = {1.0 , 2.0};

    npy_intp dims[1] = { 2 };

    PyObject *ret  = PyArray_SimpleNewFromData(1, dims, (int)NPY_FLOAT , vector );
    return ret;
}

Compilation goes OK (from what I understand - I used a python script that compiles everything).

In python, I run the following script to test my new module:

try:
    import _tmp
    res = _tmp.error()
    print(res)
except:
    print("fail")

The result I see on the screen is garbage. I tried substituting (int)NPY_FLOAT with (int)NPY_FLOAT32, (int)NPY_FLOAT64, (int)NPY_DOUBLE and I still get garbage. I am using python2.7.

Thanks!!!

EDIT: following the answer below, I changed the last function to:

static PyObject *_aux_error(PyObject *self, PyObject *args) {


    double *vector = calloc(2, sizeof(double));
    vector[0] = 1.0;
    vector[1] = 2.0;


    npy_intp *dims = calloc(1 , sizeof(npy_intp));
    dims[1] = 2;


    PyObject *ret  = PyArray_SimpleNewFromData(1, dims, (int)NPY_FLOAT , &vector );
    return ret;
}

Now python shows an empty array.

Monogram answered 7/1, 2015 at 22:39 Comment(2)
The call in you edit should be PyArray_SimpleNewFromData(1, dims, NPY_DOUBLE, vector). Does it not work like that either?Rigadoon
dims[1] = 2 is an indexing errorDiageotropism
S
8

Try changing this:

static PyObject *_aux_error(PyObject *self) {

to this:

static PyObject *_aux_error(PyObject *self, PyObject *args) {

Python will pass the args argument, even if you don't define your function with it.

There's still a fundamental problem with your code. You have created a numpy array using an array, vector, that is on the stack. When _aux_error returns, that memory is reclaimed and might be reused.

You could create the array using PyArray_SimpleNew() to allocate the numpy array, and then copy vector to the array's data:

static PyObject *_aux_error(PyObject *self, PyObject *args)
{
    double vector[2] = {1.0 , 2.0};
    npy_intp dims[1] = {2};

    PyObject *ret = PyArray_SimpleNew(1, dims, NPY_DOUBLE);
    memcpy(PyArray_DATA(ret), vector, sizeof(vector));
    return ret;
}

Note that I changed the type to NPY_DOUBLE; NPY_FLOAT is the 32 bit floating point type.


In a comment, you asked about dynamically allocating the memory in _aux_error. Here's a variation of the example that might be useful. The length of the array is still hardcoded in dims, so it isn't completely general, but it might be enough to address the question from the comments.

static PyObject *_aux_error(PyObject *self, PyObject *args)
{
    double *vector;
    npy_intp dims[1] = {5};
    npy_intp k;

    PyObject *ret = PyArray_SimpleNew(1, dims, NPY_DOUBLE);
    vector = (double *) PyArray_DATA(ret);
    /*
     *  NOTE: Treating PyArray_DATA(ret) as if it were a contiguous one-dimensional C
     *  array is safe, because we just created it with PyArray_SimpleNew, so we know
     *  that it is, in fact, a one-dimensional contiguous array.
     */
    for (k = 0; k < dims[0]; ++k) {
        vector[k] = 1.0 + k;
    }
    return ret;
}
Selfesteem answered 7/1, 2015 at 22:53 Comment(8)
Thanks for the input but this does not help. Shouldn't the example become simpler by removing the *args and not giving the function any variable?Monogram
No--the python interpreter has no idea that you have defined _aux_error with a nonstandard signature. It will be called with the two C arguments.Selfesteem
Note that I updated my answer with a comment about a fundamental problem with your example.Selfesteem
Using your advice I tried to change the way arrays are created. I use calloc. Is this the right way to do it? I am still getting garbage (not to mention the fact that I'll have to deallocate that memory from within python somehow).Monogram
The dimensions are copied by the array constructor, but the array data isn't.Rigadoon
Thank you so much! Last thing, if you may: in my real program I want to dynamically allocate memory for the array vector. Should I free that memory after using memcpy? Again, thank you for your time.Monogram
If in _aux_error you have something like vector = malloc(...);, then you fill vector with some computed values, and then call PyObject *ret = PyArray_SimpleNew(...); memcpy(PyArray_DATA(ret), vector, sizeof(vector));, then yes, you should free vector before you return. But if you know the length of the array, you could call PyObject *ret = PyArray_SimpleNew(...), and then use double *vector = PyArray_DATA(ret), and fill in vector. Then you wouldn't need to malloc any memory--since PyArray_SimpleNew did that for you--and you wouldn't free vector.Selfesteem
See the latest addition to my answer for an example.Selfesteem
M
3

Here is my full solution, for your amusement. Copy, paste and modify. Obviously the problem I was faced with is a bit more complicated than the question above. I used some of Dan Foreman Mackay's online code.

The goal of my code is to return a covariance vector (whatever that is). I have a C file named aux.c that returns a newly allocated array:

#include "aux.h"
#include <math.h>
#include <stdlib.h>
double *covVec(double *X, double *x, int nvecs, int veclen) {


    double r = 1.3;
    double d = 1.0;

    double result;
    double dist;
    int n;

    double *k;
    k = malloc(nvecs * sizeof(double));

    int row;
    for( row = 0 ; row < nvecs ; row++) {

        result = 0.0;
        for (n = 0; n < veclen; n++) {
                dist = x[n] - X[row*veclen + n];
                result += dist * dist;
        }

        result = d*exp(  -result/(2.0*r*r)  );
        k[row] = result;
    }
    return k;
}

Then, I need a very short header file named aux.h:

double *covVec(double *X, double *x, int nvecs, int veclen);

To wrap this to python I have _aux.c:

#include <Python.h>
#include <numpy/arrayobject.h>
#include "aux.h"
#include <stdio.h>

static char module_docstring[] =
    "This module provides an interface for calculating covariance using C.";

static char cov_vec_docstring[] =
    "Calculate the covariances between a vector and a list of vectors.";

static PyObject *_aux_covVec(PyObject *self, PyObject *args);

static PyMethodDef module_methods[] = {
        {"cov_vec", _aux_covVec, METH_VARARGS, cov_vec_docstring},
        {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC init_aux(void) {

    PyObject *m = Py_InitModule3("_aux", module_methods, module_docstring);
    if (m == NULL)
        return;

    /* Load `numpy` functionality. */
    import_array();
}


static PyObject *_aux_covVec(PyObject *self, PyObject *args)
{
    PyObject *X_obj, *x_obj;

    /* Parse the input tuple */
    if (!PyArg_ParseTuple(args, "OO", &X_obj, &x_obj ))
        return NULL;

    /* Interpret the input objects as numpy arrays. */
    PyObject *X_array = PyArray_FROM_OTF(X_obj, NPY_DOUBLE, NPY_IN_ARRAY);
    PyObject *x_array = PyArray_FROM_OTF(x_obj, NPY_DOUBLE, NPY_IN_ARRAY);


    /* If that didn't work, throw an exception. */
    if (X_array == NULL || x_array == NULL ) {
        Py_XDECREF(X_array);
        Py_XDECREF(x_array);
        return NULL;
    }

    /* What are the dimensions? */
    int nvecs  = (int)PyArray_DIM(X_array, 0);
    int veclen = (int)PyArray_DIM(X_array, 1);
    int xlen   = (int)PyArray_DIM(x_array, 0);

    /* Get pointers to the data as C-types. */
    double *X    = (double*)PyArray_DATA(X_array);
    double *x    = (double*)PyArray_DATA(x_array);


    /* Call the external C function to compute the covariance. */
    double *k = covVec(X, x, nvecs, veclen);



    if ( veclen !=  xlen ) {
        PyErr_SetString(PyExc_RuntimeError,
                                "Dimensions don't match!!");
        return NULL;
    }

    /* Clean up. */
    Py_DECREF(X_array);
    Py_DECREF(x_array);

    int i;
    for(i = 0 ; i < nvecs ; i++) {
        printf("k[%d]   = %f\n",i,k[i]);
        if (k[i] < 0.0) {
            PyErr_SetString(PyExc_RuntimeError,
                        "Covariance should be positive but it isn't.");
            return NULL;
        }
    }

    npy_intp dims[1] = {nvecs};

    PyObject *ret = PyArray_SimpleNew(1, dims, NPY_DOUBLE);
    memcpy(PyArray_DATA(ret), k, nvecs*sizeof(double));
    free(k);

    return ret;
}

I have a python file called setup_cov.py:

from distutils.core import setup, Extension
import numpy.distutils.misc_util

setup(
    ext_modules=[Extension("_aux", ["_aux.c", "aux.c"])],
    include_dirs=numpy.distutils.misc_util.get_numpy_include_dirs(),
)

I compile from command line using python2.7 setup_cov.py build_ext --inplace. Then I run the following python test file:

import numpy as np
import _aux as a

nvecs  = 6
veclen = 9
X= []
for _ in range(nvecs):
    X.append(np.random.normal(size= veclen))
X = np.asarray(X)

x = np.random.normal(size=veclen)
k = a.cov_vec(X,x)
print(k)
Monogram answered 8/1, 2015 at 0:31 Comment(0)
J
2

Warren's solution seems to be working, although freeing the C array memory block results in a error on compilation for me. I got the memcopy trick to work in the minimalistic function below (copying a 1D C array to numpy via the pointer), which, for simplicity, does not take any arguments, and should give the reader a good idea how to apply this to C arrays instead of vectors:

static PyObject *_cmod_test(PyObject *self, PyObject *args)
    {
    double f[5] = {0,1,2,3,4};
    int d[1] = {5};
    PyObject *c = PyArray_FromDims(1,d,NPY_DOUBLE);
    memcpy(PyArray_DATA(c), f, 5*sizeof(double));
    return c;    
    };

The launch.py script is simple

import _cmod
_cmod.test()

Don't forget to declare the functions

#include <Python.h>
#include <numpy/arrayobject.h>
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
static PyObject *_cmod_test(PyObject *self, PyObject *args);

Any suggestions for the use with PyArray_SimpleNewFromData (whilst avoiding the memory leak pittfall)? Perhaps something similar to the broken code below.

static PyObject *_cmod_test(PyObject *self, PyObject *args)
    {
    double f[5] = {0,1,2,3,4};
    npy_intp dims[1] = {5};
    PyObject *c = PyArray_SimpleNewFromData(1, dims, NPY_DOUBLE ,f);
    PyArray_ENABLEFLAGS(c, NPY_ARRAY_OWNDATA);
    return c;
    };

I also recommend Dan Foreman Mackay's blog on the python C API.

Jus answered 24/7, 2015 at 17:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.