I'm using a lot of 3D memoryviews in Cython, e.g.
cython.declare(a='double[:, :, ::1]')
a = np.empty((10, 20, 30), dtype='double')
I often want to loop over all elements of a
. I can do this using a triple loop like
for i in range(a.shape[0]):
for j in range(a.shape[1]):
for k in range(a.shape[2]):
a[i, j, k] = ...
If I do not care about the indices i
, j
and k
, it is more efficient to do a flat loop, like
cython.declare(a_ptr='double*')
a_ptr = cython.address(a[0, 0, 0])
for i in range(size):
a_ptr[i] = ...
Here I need to know the number of elements (size
) in the array. This is given by the product of the elements in the shape
attribute, i.e. size = a.shape[0]*a.shape[1]*a.shape[2]
, or more generally size = np.prod(np.asarray(a).shape)
. I find both of these ugly to write, and the (albeit small) computational overhead bothers me. The nice way to do it is to use the builtin size
attribute of memoryviews, size = a.size
. However, for reasons I cannot fathom, this leads to unoptimized C code, as evident from the annotations html file generated by Cython. Specifically, the C code generated by size = a.shape[0]*a.shape[1]*a.shape[2]
is simply
__pyx_v_size = (((__pyx_v_a.shape[0]) * (__pyx_v_a.shape[1])) * (__pyx_v_a.shape[2]));
where the C code generated from size = a.size
is
__pyx_t_10 = __pyx_memoryview_fromslice(__pyx_v_a, 3, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 2238, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_10);
__pyx_t_14 = __Pyx_PyObject_GetAttrStr(__pyx_t_10, __pyx_n_s_size); if (unlikely(!__pyx_t_14)) __PYX_ERR(0, 2238, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_14);
__Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
__pyx_t_7 = __Pyx_PyIndex_AsSsize_t(__pyx_t_14); if (unlikely((__pyx_t_7 == (Py_ssize_t)-1) && PyErr_Occurred())) __PYX_ERR(0, 2238, __pyx_L1_error)
__Pyx_DECREF(__pyx_t_14); __pyx_t_14 = 0;
__pyx_v_size = __pyx_t_7;
To generate the above code, I have enabled all possible optimizations through compiler directives, meaning that the unwieldy C code generated by a.size
cannot be optimized away. It looks to me as though the size
"attribute" is not really a pre-computed attribute, but actually carries out a computation upon lookup. Furthermore, this computation is quite a bit more involved than simply taking the product over the shape
attribute. I cannot find any hint of an explanation in the docs.
What is the explanation of this behavior, and do I have a better choice than writing out a.shape[0]*a.shape[1]*a.shape[2]
, if I really care about this micro optimization?
size
being a pure Python property fits my observations. From your link, it would seem as thoughshape
is also such a property, and that it returns atuple
. Though in practice,a.shape
returns aPy_ssize_t
pointer. – Prisoner