Parsing PEP 3118 buffer protocol format strings
Asked Answered
L

0

7

I am interested in passing binary data between python, numpy, and cython using the buffer protocol. Looking at PEP 3118, there appear to be some additions to the struct string-syntax that add support for useful features such as named fields and nested structs.

However, it appears that support for the full range of buffer syntax is limited in all of those three places. For example, say I have the following cython struct:

ctypedef packed struct ImageComp:
    uint32_t width
    uint32_t height
    uint8_t *pixels

#Here is the appropriate struct format string representation
IMAGE_FORMAT = b'T{L:width:L:height:&B:pixels:}'

Attempting to extract the PEP-3118 compliant bytes string as follows

cdef void *image_temp = malloc(sizeof(ImageComp))
IMAGE_SIZE = sizeof(ImageComp)
IMAGE_FORMAT = (<ImageComp[:1]>image_temp)._format
IMAGE_DTYPE = np.asarray(<ImageComp[:1]>image_temp).dtype
free(image_temp)

Fails with this error message: Invalid base type for memoryview slice: ImageComp since typed memoryviews cannot be created if they contain pointers.

Similarly, creating a view.array using my custom string or using the python struct module's calcsize function will give a warning like struct.error: bad char in struct format.

I can manually create and fill a Py_buffer object as described here, but attempting to convert this to a numpy array with np.asarray yields ValueError: 'T{L:width:L:height:&B:pixels:}' is not a valid PEP 3118 buffer format string.

With all of this in mind, I have the following questions:

  1. Is there any module in the standard python library that takes advantage of the complete PEP 3118 specification?
  2. Is this struct format syntax defined formally anywhere (i.e. with a PEG grammar)?
  3. Is there a way to force cython or numpy to automatically generate a valid format string if it contains pointers?
Leannaleanne answered 4/5, 2019 at 0:29 Comment(7)
Looking at this, under the valid_dtype function, there is a comment about how Pointers are not valid (yet) for base types for memoryviews.Leannaleanne
Other relevant files on the cython end: cython's buffer protocol format processing code appears to be here, while the code for templating typed memoryview is here.Leannaleanne
The problem with pointers: this is a very stupid format. You need some additional (meta) information to handle them properly: 1) is data pointed to by pointer shared, owned or just referenced (i.e. who is responsible for freeing the resources and how is it ensured that the pointers don't become dangling). 2) How big is memory (needed when a deep copy should be performed). Without the above information one cannot safely handle pointers (also not in a numpy-array). ctypes has the needed meta-information and thus can handle pointers - but they have much more overhead than simple structs.Whorled
@Whorled Yeah, there are some serious limitations to this PEP3118 format (not to mention that there are no sized C99 types or features like unions). In my use case though, I would have cython/c own the data pointed to by the pointer and be responsible for malloc-ing and free-ing the pointer. I could then write python-accessible functions to access the data pointed to by the pointer. The size information is present here (width * height * 4 * sizeof(uint8_t) bytes), although this might not be true generally. The pointer could just be exposed as an untyped uintptr_t not meant for direct use.Leannaleanne
@Whorled I was actually intending to use this in the context of a SlotMap data structure that I have implemented in cython (described here and here). On the python side, users would only get access to a uint64_t handle that would point indirectly to the bytes of a struct stored in an array of structs. I can include that cython code here if it would help.Leannaleanne
So why not use b'T{L:width:L:height:Q:pixels:}'. The meta-information about pointers is somewhere in your programm-logic (as opposed to be a part of the data), so adding "interpret pixels as an address" isn't that big deal anymore, even if it is maybe not the most stellar solution.Whorled
@Whorled That certainly could work in this case. The problem would then be that I would have to create all of these format strings from scratch, rather than use the code in my second snippet to generate them for me. I could just replace the uint8_t * pixels with uintptr_t pixels in the struct definition itself, but then type information is definitely lost, but now on the cython side as well.Leannaleanne

© 2022 - 2024 — McMap. All rights reserved.