how to return numpy.array from boost::python?
Asked Answered
M

5

29

I would like to return some data from c++ code as a numpy.array object. I had a look at boost::python::numeric, but its documentation is very terse. Can I get an example of e.g. returning a (not very large) vector<double> to python? I don't mind doing copies of data.

Malliemallin answered 22/5, 2012 at 11:52 Comment(3)
I agree its documentation is dreadful. They just copy the commentless header into their documentation page and don't show you the basics, i.e. getting data from STL collection into this object.Hectorhecuba
The boost people are very clever, too clever for their own good. I go to their Wrapper concepts page and see nothing that makes sense.Hectorhecuba
I found what I think is the best solution I've come across yet and posted it below.Hectorhecuba
N
28

UPDATE: the library described in my original answer (https://github.com/ndarray/Boost.NumPy) has been integrated directly into Boost.Python as of Boost 1.63, and hence the standalone version is now deprecated. The text below now corresponds to the new, integrated version (only the namespace has changed).

Boost.Python now includes a moderately complete wrapper of the NumPy C-API into a Boost.Python interface. It's pretty low-level, and mostly focused on how to address the more difficult problem of how to pass C++ data to and from NumPy without copying, but here's how you'd do a copied std::vector return with that:

#include "boost/python/numpy.hpp"

namespace bp = boost::python;
namespace bn = boost::python::numpy;

std::vector<double> myfunc(...);

bn::ndarray mywrapper(...) {
    std::vector<double> v = myfunc(...);
    Py_intptr_t shape[1] = { v.size() };
    bn::ndarray result = bn::zeros(1, shape, bn::dtype::get_builtin<double>());
    std::copy(v.begin(), v.end(), reinterpret_cast<double*>(result.get_data()));
    return result;
}

BOOST_PYTHON_MODULE(example) {
    bn::initialize();
    bp::def("myfunc", mywrapper);
}
Nonobedience answered 22/5, 2012 at 15:39 Comment(4)
May be very nice if I could actually get to the code but github seems to be blocked here, or something else is wrong because I'm getting a broken link. Surely there must be a way to populate a boost::python::numeric::array with data from a simple std::vector without having to get some 3rd party library. It would help if boost's documentation actually gave you documentation on the member functions rather than reproducing the uncommented header.Hectorhecuba
I can't make an edit because it's too minor, but it should be bn::zeros, not bp::zeros.Mcdaniel
I could not make this work (Ubuntu 14.04). What would be an example for (...)?, what is bn::initialize() supposed to do?. Also the example seems outdated -> When I try in include boost/numpy.hpp I get fatal error: boost/numpy.hpp: No such file or directoryTerpsichore
It would be helpful if a compilation line was added, say with g++. At least that would give information about the linking.Gravimetric
H
21

A solution that doesn't require you to download any special 3rd party C++ library (but you need numpy).

#include <numpy/ndarrayobject.h> // ensure you include this header

boost::python::object stdVecToNumpyArray( std::vector<double> const& vec )
{
      npy_intp size = vec.size();

     /* const_cast is rather horrible but we need a writable pointer
        in C++11, vec.data() will do the trick
        but you will still need to const_cast
      */

      double * data = size ? const_cast<double *>(&vec[0]) 
        : static_cast<double *>(NULL); 

    // create a PyObject * from pointer and data 
      PyObject * pyObj = PyArray_SimpleNewFromData( 1, &size, NPY_DOUBLE, data );
      boost::python::handle<> handle( pyObj );
      boost::python::numeric::array arr( handle );

    /* The problem of returning arr is twofold: firstly the user can modify
      the data which will betray the const-correctness 
      Secondly the lifetime of the data is managed by the C++ API and not the 
      lifetime of the numpy array whatsoever. But we have a simple solution..
     */

       return arr.copy(); // copy the object. numpy owns the copy now.
  }

Of course you might write a function from double * and size, which is generic then invoke that from the vector by extracting this info. You could also write a template but you'd need some kind of mapping from data type to the NPY_TYPES enum.

Hectorhecuba answered 9/1, 2013 at 10:16 Comment(11)
Thanks for this example. Just a heads up, I had to use numeric::array::set_module_and_type("numpy", "ndarray"); or I would get the python runtime error "ImportError: No module named 'Numeric' or its type 'ArrayType' did not follow the NumPy protocol"Outspan
Why are you const_casting if you can just make the argument a non-const reference?Bulgar
@Bulgar Because we want the argument to be a const reference. We are not actually going to modify the data, but we need to workaround the fact that PyArray_SimpleNewFromData requires a double*Hectorhecuba
Note that unlike many of my answers on StackOverflow this was a situation where I actually needed it, came here, found the question but no adequate answer. Then worked it out and came back to post it.Hectorhecuba
Ah, I see. Bad API needs const_cast... When will we ever see the end of that.Bulgar
I don't really know if it's a bad API because Python has no concept of const so a numpy array is always modifiable. However before we actually let Python users use our object we duplicate it, creating a new copy that they can modify happily without worrying our own data.Hectorhecuba
Can you not avoid the const_cast by just creating a numpy array that owns its own memory using PyArray_SimpleNew then copying the vector's data into it?As
I use you method, but declaring in the function body 'double data[4] ={1,2,3,4}'. I got segmentation fault.Segregate
Do that and size as 4 in the call to PyArray_SimpleNewFromData and then as after that including arr.copy() should work. Failure to duplicate your object will indeed lead to undefined behaviour if they try using it as it's local to the function and will not be valid anymore.Hectorhecuba
Apparantly .copy() during return is not necessary. Simply return arr works too -> Avoiding this copy operation might help to gain performanceTerpsichore
arr.copy() is necessary the way I did it for the 2 reasons I specified. That makes the data belong to python object in a way that it can be modified and its lifetime is determined by Python and not the vector from which it got its data.Hectorhecuba
Q
10

It's a bit late, but after many unsuccessful tries I found a way to expose c++ arrays as numpy arrays directly. Here is a short C++11 example using boost::python and Eigen:

#include <numpy/ndarrayobject.h>
#include <boost/python.hpp>

#include <Eigen/Core>

// c++ type
struct my_type {
  Eigen::Vector3d position;
};


// wrap c++ array as numpy array
static boost::python::object wrap(double* data, npy_intp size) {
  using namespace boost::python;

  npy_intp shape[1] = { size }; // array size
  PyObject* obj = PyArray_New(&PyArray_Type, 1, shape, NPY_DOUBLE, // data type
                              NULL, data, // data pointer
                              0, NPY_ARRAY_CARRAY, // NPY_ARRAY_CARRAY_RO for readonly
                              NULL);
  handle<> array( obj );
  return object(array);
}



// module definition
BOOST_PYTHON_MODULE(test)
{
  // numpy requires this
  import_array();

  using namespace boost::python;

  // wrapper for my_type
  class_< my_type >("my_type")
    .add_property("position", +[](my_type& self) -> object {
        return wrap(self.position.data(), self.position.size());
      });

}

The example describes a "getter" for the property. For the "setter", the easiest way is to assign the array elements manually from a boost::python::object using a boost::python::stl_input_iterator<double>.

Quadratic answered 1/12, 2015 at 14:55 Comment(10)
Could you tell me how to setup my project to be able to use the numpy header? Do I need to compile some libraries? Or is it enough to include the numpy header?Cartulary
I got the numpy header directory using: python -c "import numpy; print numpy.get_include()"Quadratic
Ok. That worked, thanks. but the compiler complains that import_array() is returning a value, while init_module_... is a 'void' function.Cartulary
Ok, so it seems to be related with how the import_array() macro was change from Python 2 to Python 3 to now return something. Here is a (ugly) solution that keeps it version independent: mail.scipy.org/pipermail/numpy-discussion/2010-December/…Cartulary
finally someone got it right! With a comprehensive example! Thank You!Sanfordsanfourd
what is this notation I never saw before +[](my_type& self) in your code ? I'm talking about +[] or more specifically the + sign before the lambda capture [] ?Aim
IIRC it's a hack to force a conversion to a function pointer from a (non-capturing) lambda expression. I am not sure if it is required by the standard (I'd say it's not), but it helped triggering the conversion on some compilers. Edit: found it: #18889528Quadratic
wow, what a trick. At least I can say that without it, my boost.python code will not compile. So in my case, the trick is necessary. ThanksAim
I get a segfault with this, which unfortunate as this is precisely what I am looking for... ;/Chronological
Apparently, boost::python now provides direct access to numpy arrays: boost.org/doc/libs/1_63_0/libs/python/doc/html/numpy/tutorial/… can't get it to link though :-/Quadratic
R
2

Doing it using the numpy api directly is not necessarily difficult, but I use boost::multiarray regularly for my projects and find it convenient to transfer the shapes of the array between the C++/Python boundary automatically. So, here is my recipe. Use http://code.google.com/p/numpy-boost/, or better yet, this version of the numpy_boost.hpp header; which is a better fit for multi-file boost::python projects, although it uses some C++11. Then, from your boost::python code, use something like this:

PyObject* myfunc(/*....*/)
{
   // If your data is already in a boost::multiarray object:
   // numpy_boost< double, 1 > to_python( numpy_from_boost_array(result_cm) );
   // otherwise:
   numpy_boost< double, 1> to_python( boost::extents[n] );
   std::copy( my_vector.begin(), my_vector.end(), to_python.begin() );

   PyObject* result = to_python.py_ptr();
   Py_INCREF( result );

   return result;
}
Reticent answered 22/5, 2012 at 13:41 Comment(4)
What would be the correct way to return a py::object (py=boost::python)? I have PyObject* result=numpy_boost<double,2>(numpy_from_boost_array(...)).py_ptr(); and return py::object(py::handle<>(py::borrowed(o))); but that crashes. Hint?Malliemallin
PS. the crash is at line 229 of the dropbox version, line a = (PyArrayObject*)PyArray_SimpleNew(NDims, shape, detail::numpy_type_map<T>::typenum);. Strange.Malliemallin
@Malliemallin You might have a problem with the PY_ARRAY_UNIQUE_SYMBOL and NO_IMPORT_ARRAY macros, as well as import_array, as your crash is exactly when the array is created, which needs a call (I think) through certain pointer table that numpy needs (initialized with import_array() ).Reticent
The link to the C++11 version is broken. Would you mind fixing that?Therein
H
2

I looked at the available answers and thought, "this will be easy". I proceeded to spend hours attempting what seemed like a trivial examples/adaptations of the answers.

Then I implemented @max's answer exactly (had to install Eigen) and it worked fine, but I still had trouble adapting it. My problems were mostly (by number) silly, syntax mistakes, but additionally I was using a pointer to a copied std::vector's data after the vector seemed to be dropped off the stack.

In this example, a pointer to the std::vector is returned, but also you could return the size and data() pointer or use any other implementation that gives your numpy array access to the underlying data in a stable manner (i.e. guaranteed to exist):

class_<test_wrap>("test_wrap")
    .add_property("values", +[](test_wrap& self) -> object {
            return wrap(self.pvalues()->data(),self.pvalues()->size());
        })
    ;

For test_wrap with a std::vector<double> (normally pvalues() might just return the pointer without populating the vector):

class test_wrap {
public:
    std::vector<double> mValues;
    std::vector<double>* pvalues() {
        mValues.clear();
        for(double d_ = 0.0; d_ < 4; d_+=0.3)
        {
            mValues.push_back(d_);
        }
        return &mValues;
    }
};

The full example is on Github so you can skip the tedious transcription steps and worry less about build, libs, etc. You should be able to just do the following and get a functioning example (if you have the necessary features installed and your path setup already):

git clone https://github.com/ransage/boost_numpy_example.git
cd boost_numpy_example
# Install virtualenv, numpy if necessary; update path (see below*)
cd build && cmake .. && make && ./test_np.py

This should give the output:

# cmake/make output
values has type <type 'numpy.ndarray'>
values has len 14
values is [ 0.   0.3  0.6  0.9  1.2  1.5  1.8  2.1  2.4  2.7  3.   3.3  3.6  3.9]

*In my case, I put numpy into a virtualenv as follows - this should be unnecessary if you can execute python -c "import numpy; print numpy.get_include()" as suggested by @max:

# virtualenv, pip, path unnecessary if your Python has numpy
virtualenv venv
./venv/bin/pip install -r requirements.txt 
export PATH="$(pwd)/venv/bin:$PATH"

Have fun! :-)

Hyperbaric answered 10/4, 2016 at 0:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.