I am interested in passing binary data between python, numpy, and cython using the buffer protocol. Looking at PEP 3118, there appear to be some additions to the struct string-syntax that add support for useful features such as named fields and nested structs.
However, it appears that support for the full range of buffer syntax is limited in all of those three places. For example, say I have the following cython struct:
ctypedef packed struct ImageComp:
uint32_t width
uint32_t height
uint8_t *pixels
#Here is the appropriate struct format string representation
IMAGE_FORMAT = b'T{L:width:L:height:&B:pixels:}'
Attempting to extract the PEP-3118 compliant bytes string as follows
cdef void *image_temp = malloc(sizeof(ImageComp))
IMAGE_SIZE = sizeof(ImageComp)
IMAGE_FORMAT = (<ImageComp[:1]>image_temp)._format
IMAGE_DTYPE = np.asarray(<ImageComp[:1]>image_temp).dtype
free(image_temp)
Fails with this error message:
Invalid base type for memoryview slice: ImageComp
since typed memoryviews cannot be created if they contain pointers.
Similarly, creating a view.array
using my custom string or using the python struct
module's calcsize
function will give a warning like struct.error: bad char in struct format
.
I can manually create and fill a Py_buffer
object as described here, but attempting to convert this to a numpy array with np.asarray
yields ValueError: 'T{L:width:L:height:&B:pixels:}' is not a valid PEP 3118 buffer format string
.
With all of this in mind, I have the following questions:
- Is there any module in the standard python library that takes advantage of the complete
PEP 3118
specification? - Is this struct format syntax defined formally anywhere (i.e. with a PEG grammar)?
- Is there a way to force cython or numpy to automatically generate a valid format string if it contains pointers?
valid_dtype
function, there is a comment about howPointers are not valid (yet)
for base types for memoryviews. – Leannaleannewidth * height * 4 * sizeof(uint8_t)
bytes), although this might not be true generally. The pointer could just be exposed as an untyped uintptr_t not meant for direct use. – LeannaleanneSlotMap
data structure that I have implemented in cython (described here and here). On the python side, users would only get access to auint64_t
handle that would point indirectly to the bytes of a struct stored in an array of structs. I can include that cython code here if it would help. – Leannaleanneb'T{L:width:L:height:Q:pixels:}'
. The meta-information about pointers is somewhere in your programm-logic (as opposed to be a part of the data), so adding "interpret pixels as an address" isn't that big deal anymore, even if it is maybe not the most stellar solution. – Whorled