Cython bytes to C char*
Asked Answered
T

3

6

I am trying to write a Cython extension to CPython to wrap the mcrypt library, so that I can use it with Python 3. However, I am running into a problem where I segfault while trying to use one of the mcrypt APIs.

The code that is failing is:

def _real_encrypt(self, source):
    src_len = len(source)
    cdef char* ciphertext = source
    cmc.mcrypt_generic(self._mcStream, <void *>ciphertext, src_len)
    retval = source[:src_len]
    return retval

Now, the way I understand the Cython documentation, the assignment on line 3 should copy the contents of the buffer (a object in Python 3) to the C string pointer. I would figure that this would also mean that it would allocate the memory, but when I made this modification:

def _real_encrypt(self, source):
    src_len = len(source)
    cdef char* ciphertext = <char *>malloc(src_len)
    ciphertext = source
    cmc.mcrypt_generic(self._mcStream, <void *>ciphertext, src_len)
    retval = source[:src_len]
    return retval

it still crashed with a segfault. It's crashing inside of mcrypt_generic, but when I use plain C code I am able to make it work just fine, so there has to be something that I am not quite understanding about how Cython is working with C data here.

Thanks for any help!

ETA: The problem was a bug on my part. I was working on this after being awake for far too many hours (isn't that something we've all done at some point?) and missed something stupid. The code that I now have, which works, is:

def _real_encrypt(self, source):
    src_len = len(source)
    cdef char *ciphertext = <char *>malloc(src_len)
    cmc.strncpy(ciphertext, source, src_len)
    cmc.mcrypt_generic_init(self._mcStream, <void *>self._key,
                            len(self._key), NULL)
    cmc.mcrypt_generic(self._mcStream, <void *>ciphertext,
                       src_len)

    retval = ciphertext[:src_len]
    cmc.mcrypt_generic_deinit(self._mcStream)
    return retval

It's probably not the most efficient code in the world, as it makes a copy to do the encryption and then a second copy to the return value. I'm not sure if it is possible to avoid that, though, since I'm not sure if it is possible to take a newly-allocated buffer and return it to Python in-place as a bytestring. But now that I have a working function, I'm going to implement a block-by-block method as well, so that one can provide an iterable of blocks for encryption or decryption, and be able to do it without having the entire source and destination all in memory all at once---that way, it'd be possible to encrypt/decrypt huge files without having to worry about holding up to three copies of it in memory at any one point...

Thanks for the help, everyone!

Thomasina answered 14/12, 2010 at 7:33 Comment(0)
T
4

The first one is pointing the char* at the Python string. The second allocates memory, but then re-points the pointer to the Python string and ignores the newly allocated memory. You should be invoking the C library function strcpy from Cython, presumably; but I don't know the details.

Tautology answered 14/12, 2010 at 8:23 Comment(4)
... this would be why I probably should have gone to bed last night, because I was trying to program while overtired. Indeed, what got it to work was a call to strncpy (which I used because of the possibility of NULL bytes in the input), and then I was able to do the call to mcrypt_generic, copy the output into a Python bytestring, free the temporary buffer, and return. Thanks for this answer, it pointed me in the right direction.Thomasina
!!! strncpy will not help you if there could validly be NULL bytes in the input. That means your input is not really a string at all, but a sequence of bytes. Use memcpy or something like that.Tautology
Oh, you are absolutely correct; the n is "up to". Oh, crap. Oh, oh, crap. I think you might have just pointed out my bug that I posted on another question: #4452477Thomasina
No, that didn't do it. In fact, using memcpy makes the whole thing fail... I'm even more confused than I was previously. Sigh.Thomasina
A
3

A few comments on your code to help improve it, IMHO. There are functions provided by the python C API that do exactly what you need to do, and make sure everything conforms to the Python way of doing things. It will handle embedded NULL's without a problem.

Rather than calling malloc directly, change this:

cdef char *ciphertext = <char *>malloc(src_len)

to

cdef str retval = PyString_FromStringAndSize(PyString_AsString(source), <Py_ssize_t>src_len)
cdef char *ciphertext = PyString_AsString(retval)

The above lines will create a brand new Python str object initialized to the contents of source. The second line points ciphertext to retval's internal char * buffer without copying. Whatever modifies ciphertext will modify retval. Since retval is a brand new Python str, it can be modified by C code before being returned from _real_encrypt.

See the Python C/API docs on the above functions for more details, here and here.

The net effect saves you a copy. The whole code would be something like:

cdef extern from "Python.h":
    object PyString_FromStringAndSize(char *, Py_ssize_t)
    char *PyString_AsString(object)

def _real_encrypt(self, source):
    src_len = len(source)
    cdef str retval = PyString_FromStringAndSize(PyString_AsString(source), <Py_ssize_t>src_len)
    cdef char *ciphertext = PyString_AsString(retval)
    cmc.mcrypt_generic_init(self._mcStream, <void *>self._key,
                            len(self._key), NULL)
    cmc.mcrypt_generic(self._mcStream, <void *>ciphertext,
                       src_len)
    # since the above initialized ciphertext, the retval str is also correctly initialized, too.
    cmc.mcrypt_generic_deinit(self._mcStream)
    return retval
Altorilievo answered 21/2, 2011 at 3:2 Comment(0)
O
2

The approach I've used (with Python 2.x) is to declare the string type parameters in the function signature so that the Cython code does all conversions and type checking automatically:

def _real_encrypt(self,char* src):
    ...
Obstreperous answered 14/12, 2010 at 8:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.