I am attempting to create a function in cython that accepts a numpy structured array or record array by defining a cython struct type. Suppose I have the data:
a = np.recarray(3, dtype=[('a', np.float32), ('b', np.int32), ('c', '|S5'), ('d', '|S3')])
a[0] = (1.1, 1, 'this\0', 'to\0')
a[1] = (2.1, 2, 'that\0', 'ta\0')
a[2] = (3.1, 3, 'dogs\0', 'ot\0')
(Note: the problem described below occurs with or without the null terminator)
I then have the cython code:
import numpy as np
cimport numpy as np
cdef packed struct tstruct:
np.float32_t a
np.int32_t b
char[5] c
char[3] d
def test_struct(tstruct[:] x):
cdef:
int k
tstruct y
for k in xrange(3):
y = x[k]
print y.a, y.b, y.c, y.d
When I try to run test_struct(a)
, I get the error:
ValueError: Expected a dimension of size 5, got 8
If in the array and corresponding struct are reordered such that the fields containing strings are not adjacent to each other, then the function works as expected. It appears as if the Cython function is not detecting the boundary between the c
and d
fields correctly and thinks as if you are passing in a char array of the sum of the lengths.
Short of reshuffling the data (which is possible but not ideal), is there another way to pass a recarray with fixed length string data into Cython?
Update: This appears to be a potential Cython bug. See the following discussion on the Cython google group that hints at where the problem is arising:
https://groups.google.com/forum/#!topic/cython-users/TbLbXdi0_h4
Update 2: This bug has been fixed in the master cython branch on Github as of Feb 23, 2014 and the patch is slated for inclusion in v0.20.2: https://github.com/cython/cython/commit/58d9361e0a6d4cb3d4e87775f78e0550c2fea836
align=True
(see docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html). – Convocationb
betweenc
andd
) then everything works as expected. The problem is that the boundary between adjacent strings does not appear to be detected properly. – Dosi