Indexing individual array elements
This the main type of code where Cython can really help you. Indexing an individual element (e.g. an = a[n]
) can be a fairly slow operation in Python. Partly because Python is not a hugely quick language and so running Python code lots of times within a loop can be slow, and partly because the array is stored as a tightly-packed array of C floats, but the indexing operation needs to return a Python object. Therefore, indexing a Numpy array requires new Python objects to be allocated.
In Cython you can declare the arrays as typed memoryviews, or as np.ndarray
. (Typed memoryviews are the more modern approach and you should usually prefer them). Doing so allows you to directly access the tightly-packed C array directly and retrieve the C value, without creating a Python object.
The directives cython.boundscheck
and cython.wraparound
can be very worthwhile for further speed-ups to the index (but remember they do remove useful features, so think before using them).
vs vectorization
A lot of the time a loop over a Numpy array can be written as a vectorized operation - something that acts on the whole array at once. It is usually a good idea to write Python+Numpy code like this. If you have multiple chained vectorized operations, it is sometimes worthwhile to write it explicitly as a Cython loop to avoid allocating intermediate arrays.
Alternatively the little-known Pythran backend for Cython convert a set of vectorized Numpy operations to optimized C++ code.
Indexing array slices
Isn't a problem in Cython, but typically isn't something that will get you significant speed-ups alone.
Calling Numpy functions
e.g. last = np.sin(an)
These require a Python call, so Cython typically cannot accelerate these - it has no visibility into the contents of the Numpy function.
However, here the operation is on a single value, and not on a Numpy array. In this case we can use sin
from the C standard library, which will be significantly quicker than a Python function call. You'd do from libc.math cimport sin
and call sin
rather than np.sin
.
Numba is an alternative Python accelerator that has better visibility into Numpy functions can can often optimize without changes.
Array allocations
e.g. transformed_a = np.zeros_like(a)
.
This is just a Numpy function call and thus Cython has no ability to speed it up. If it's just an intermediate value and not returned to Python then you might consider a fixed-size C array on the stack
cdef double transformed_a[10] # note - you must know the size at compile-time
or by allocating them with the C malloc
function (remember to free
it). Or by using Cython's cython.view.array
(which is still a Python object, but can be a little quicker).
Whole-array arithmetic
e.g. transformed_a * b
, which multiplies transformed_a
and b
element by element.
Cython doesn't help you here - it's just a disguised function call (although Pythran+Cython may have some benefits). For large arrays this kind of operation is pretty efficient in Numpy so don't overthink it.
Note that whole-array operations aren't defined for Cython typed memoryviews, so you need to do np.asarray(memview)
to get them back to Numpy arrays. This typically doesn't need a copy and is quick.
For some operations like this, you can use BLAS
and LAPACK
functions (which are fast C implementations of array and matrix operations). Scipy provies a Cython interface for them (https://docs.scipy.org/doc/scipy/reference/linalg.cython_blas.html) to use. They're a little more complex to use that the natural Python code.
The illustrative example
Just for completeness, I'd write it something like:
import numpy as np
from libc.math cimport sin
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def some_func(double[::1] a, b):
cdef double[::1] transformed_a = np.zeros_like(a)
cdef double last = 0
cdef double an, delta
cdef Py_ssize_t n
for n in range(1, a.shape[0]):
an = a[n]
if an > 0:
delta = an - a[n-1]
transformed_a[n] = delta*last
else:
last = sin(an)
return np.asarray(transformed_a) * b
which is a little over 10x faster.
cython -a
is helpful here - it produces an annotated HTML file that shows which lines contain a lot of interaction with Python.