How to use Cython typed memoryviews to accept strings from Python?
Asked Answered
S

2

14

How can I write a Cython function that takes a byte string object (a normal string, a bytearray, or another object that follows the buffer protocol) as a typed memoryview?

According to the Unicode and Passing Strings Cython tutorial page, the following should work:

cpdef object printbuf(unsigned char[:] buf):
    chars = [chr(x) for x in buf]
    print repr(''.join(chars))

It does work for bytearrays and other writable buffers:

$ python -c 'import test; test.printbuf(bytearray("test\0ing"))'
'test\x00ing'

But it doesn't work for normal strings and other read-only buffer objects:

$ python -c 'import test; test.printbuf("test\0ing")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "test.pyx", line 1, in test.printbuf (test.c:1417)
  File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (test.c:6795)
  File "stringsource", line 321, in View.MemoryView.memoryview.__cinit__ (test.c:3341)
BufferError: Object is not writable.

Looking at the generated C code, Cython is always passing the PyBUF_WRITABLE flag to PyObject_GetBuffer(), which explains the exception.

I can manually get a view into the buffer object myself, but it's not as convenient:

from cpython.buffer cimport \
    PyBUF_SIMPLE, PyBUF_WRITABLE, \
    PyObject_CheckBuffer, PyObject_GetBuffer, PyBuffer_Release

cpdef object printbuf(object buf):
    if not PyObject_CheckBuffer(buf):
        raise TypeError("argument must follow the buffer protocol")
    cdef Py_buffer view
    PyObject_GetBuffer(buf, &view, PyBUF_SIMPLE)
    try:
        chars = [chr((<unsigned char *>view.buf)[i])
                 for i in range(view.len)]
        print repr(''.join(chars))
    finally:
        PyBuffer_Release(&view)
$ python -c 'import test; test.printbuf(bytearray("test\0ing"))'
'test\x00ing'
$ python -c 'import test; test.printbuf("test\0ing")'
'test\x00ing'

Am I doing something wrong, or does Cython not support coercing read-only buffer objects (such as normal strings) into typed memoryview objects?

Scorch answered 28/1, 2015 at 22:28 Comment(2)
I found your patch here even adding const does not help, so this means the suggested documentation is not working.Vacillate
const now works for me with Cython 0.28.4Warrant
B
10

This issue was fixed in Cython 0.28, released 2018-03-13 (PR #1869). The changelog says:

The const modifier can be applied to memoryview declarations to allow read-only buffers as input.

There is also a new section in the documentation.

The example you gave will work in Cython 0.28 if you write your function like this:

cpdef object printbuf(const unsigned char[:] buf):
    chars = [chr(x) for x in buf]
    print repr(''.join(chars))
Brilliant answered 21/6, 2018 at 11:55 Comment(0)
S
21

Despite the documentation suggesting otherwise, Cython (at least up to version 0.22) does not support coercing read-only buffer objects into typed memoryview objects. Cython always passes the PyBUF_WRITABLE flag to PyObject_GetBuffer(), even when it doesn't need write access. This causes read-only buffer objects to raise an exception.

I raised this issue on the Cython developer mailing list, and even included a (very rough) patch. I never got a reply, so I assume the Cython developers are not interested in fixing this bug.

Scorch answered 15/3, 2015 at 23:0 Comment(4)
I got the same problem too, why they're not interested in fixing this?Wenda
You might try submitting the patch as a GitHub pull request; and adding in some test cases.Cates
Indeed, have you tried to push the patch through GitHub?Raid
To repeat what others said, this likely means they did not see or understand the patch, or did not have capacity for its rough state -- not that they are not interested in fixing a bug. It's normal human behavior to reply when there is capacity to do so, even to things you're not interested in.Croak
B
10

This issue was fixed in Cython 0.28, released 2018-03-13 (PR #1869). The changelog says:

The const modifier can be applied to memoryview declarations to allow read-only buffers as input.

There is also a new section in the documentation.

The example you gave will work in Cython 0.28 if you write your function like this:

cpdef object printbuf(const unsigned char[:] buf):
    chars = [chr(x) for x in buf]
    print repr(''.join(chars))
Brilliant answered 21/6, 2018 at 11:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.