How to declare 2D c-arrays dynamically in Cython

Asked 18/9, 2014 at 16:12 Answered 22/9, 2014 at 11:4

c numpy cython dynamic-arrays memoryview

I need to perform a lot of work using 2D numpy arrays of various sizes and I would like to offload these calculations onto cython. The idea is that my 2D numpy arrays would be passed from python to cython where it would be converted into c-array or memory view and used in a cascade of other c-level functions to do the calculations.

After some profiling I ruled out using numpy arrays in cython due to some serious python overhead. Using memory views was MUCH faster and quite easy to use, but I suspect I can squeeze even more speedup from using c-arrays.

Here is my question though - how can I declare a 2D c-array in cython without predefining its dimensions with set values? For example, I can create a c-array from numpy this way:

narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=np.dtype("i"))

cdef int c_arr[3][4]:
for i in range(3):
    for j in range(4):
        c_arr[i][j] = narr[i][j]

and then pass it to a function:

cdef void somefunction(int c_Arr[3][4]):
    ...

But this implies I have a fixed sizde of array, which in my case will be useless. So I tried something like this:

narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]], dtype=np.dtype("i"))

cdef int a = np.shape(narr)[0]
cdef int b = np.shape(narr)[1]

cdef int c_arr[a][b]:               # INCORRECT - EXAMPLE ONLY

for i in range(a):
    for j in range(b):
        c_arr[i][j] = narr[i][j]

with the intention to pass it to a function like this:

cdef void somefunction(int a, int b, int c_Arr[a][b]):
    ...

But it doesn't work and the compilation fails with the error "Not allowed in a constant expression". I suspect I need t do it with malloc/free somehow? I had a look at this problem (How to declare 2D list in Cython), but it does not provide the answer to my problem.

UPDATE:

It turns out that memory-views can be as fast as c-arrays if one makes sure that indexError checking in cython is switched-off for the memory views, which can be done by using cython compiler directive:

# cython: boundscheck=False

Thanks @Veedrac for the tip!

Viscometer answered 18/9, 2014 at 16:12 Comment(3)

Also note that whenever you have stupid numbers like "2700x faster" it's because one is getting optimized out completely. The benefits will be much smaller if you actually use the output. – Trotter 19/9, 2014 at 19:49

Ah yes, good point! Also using floats/doubles instead of integers makes an impact too... – Viscometer 22/9, 2014 at 11:20

Aside from "# cython: boundscheck=False", you can also seek additional speed-up with "#cython: cython.wraparound=False" – Kathe 25/8, 2018 at 8:16

You just need to stop doing bounds checking:

with cython.boundscheck(False):
    thesum += x_view[i,j]

that brings the speed basically up to par.

If you really want a C array from it, try:

import numpy as numpy
from numpy import int32
from numpy cimport int32_t

numpy_array = numpy.array([[]], dtype=int32)

cdef:
    int32_t[:, :] cython_view = numpy_array
    int32_t *c_integers_array = &cython_view[0, 0]
    int32_t[4] *c_2d_array = <int32_t[4] *>c_integers_array

First you get a Numpy array. You use that to get a memory view. Then you get a pointer to its data, which you cast to pointers of the desired stride.

Trotter answered 18/9, 2014 at 18:57 Comment(4)

Brilliant, thanks @Trotter , that did brought me back up to the same speed as c_arrays! – Viscometer 19/9, 2014 at 9:55

However, I still wonder about how can you (if you can) generate the 2D c-arrays by dynamically specifying its size? I got as far as declaring "cdef int carr = <int*>malloc(absizeof(int))" and then using two for-loops to assign individual array values with "carr[(i*a)+j] = narr[i][j]", but it's still not a proper 2D array, nor I can actually pass it to a function as a 2D array... Anyone? – Viscometer 19/9, 2014 at 10:1

Does this answer your question? – Trotter 19/9, 2014 at 20:43

Ha! Yes it does, thank you again! I can see now that the memory views approach is MUCH easier than messing around with c_arrays, and what's more important - just as fast too! Sorted. – Viscometer 22/9, 2014 at 10:45

So after invaluable help from @Veedrac (Many thanks!) I finally came up with a script that demonstrates the use of both, memory views and c-arrays to speed up calculations in Cython. They both go down to similar speeds and so I personally think using memory-views is MUCH easier.

Here is an example cython script that "accepts" a numpy array and converts it to memory view or c-array and then performs simple array summation via c-level functions:

# cython: boundscheck=False

cimport cython
import numpy as np
cimport numpy as np

from numpy import int32
from numpy cimport int32_t


#Generate numpy array:
narr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]], dtype=np.dtype("i"))

cdef int a = np.shape(narr)[0]
cdef int b = np.shape(narr)[1]
cdef int i, j

testsum = np.sum(narr)
print "Test summation: np.sum(narr) =", testsum

#Generate the memory view:
cdef int [:,:] x_view = narr

#Generate the 2D c-array and its pointer:
cdef:
    int32_t[:, :] cython_view = narr
    int32_t *c_integers_array = &cython_view[0, 0]
    int32_t[4] *c_arr = <int32_t[4] *>c_integers_array


def test1():

    speed_test_mview(x_view)  

def test2():

    speed_test_carray(&c_arr[0][0], a, b)


cdef int speed_test_mview(int[:,:] x_view):

    cdef int n, i, j, thesum

    # Define the view:
    for n in range(10000):
        thesum = 0
        for i in range(a):
            for j in range(b):
                thesum += x_view[i, j]        


cdef int speed_test_carray(int32_t *c_Arr, int a, int b):

    cdef int n, i, j, thesum
    for n in range(10000):
        thesum = 0
        for i in range(a):
            for j in range(b):
                thesum += c_Arr[(i*b)+j]

Then using ipython shell timing tests reveal similar speeds:

import testlib as t
Test summation: np.sum(narr) = 136

%timeit t.test1()
10000000 loops, best of 3: 46.3 ns per loop

%timeit t.test2()
10000000 loops, best of 3: 46 ns per loop

Oh and for comparison - using numpy arrays in this example took 125 ms (not shown).

Viscometer answered 22/9, 2014 at 11:4 Comment(2)

You're converting c_arr to a 1D array by using &c_arr[0][0], but you might as well just have used c_integers_array, which is exactly the same thing. The advantage of c_arr is that you can pass it to a function that accepts int[4] *array or even int array[4][4]. – Trotter 22/9, 2014 at 11:38

Also, if you do proper timings that prevent the non-bounds-checked versions to get optimized out completely, you notice two things. The first is that accessing the globals a and b is much slower than doing cdef int a, b; a = x_view.shape[0]; b = x_view.shape[1] and looping over those. The second is that bounds checking has about a 60% overhead (eg from 1s → 1.6s). – Trotter 22/9, 2014 at 11:48

UPDATE:

Recommended topics

Hot tags