I am trying to write a Cython extension to CPython to wrap the mcrypt library, so that I can use it with Python 3. However, I am running into a problem where I segfault while trying to use one of the mcrypt APIs.
The code that is failing is:
def _real_encrypt(self, source):
src_len = len(source)
cdef char* ciphertext = source
cmc.mcrypt_generic(self._mcStream, <void *>ciphertext, src_len)
retval = source[:src_len]
return retval
Now, the way I understand the Cython documentation, the assignment on line 3 should copy the contents of the buffer (a object in Python 3) to the C string pointer. I would figure that this would also mean that it would allocate the memory, but when I made this modification:
def _real_encrypt(self, source):
src_len = len(source)
cdef char* ciphertext = <char *>malloc(src_len)
ciphertext = source
cmc.mcrypt_generic(self._mcStream, <void *>ciphertext, src_len)
retval = source[:src_len]
return retval
it still crashed with a segfault. It's crashing inside of mcrypt_generic, but when I use plain C code I am able to make it work just fine, so there has to be something that I am not quite understanding about how Cython is working with C data here.
Thanks for any help!
ETA: The problem was a bug on my part. I was working on this after being awake for far too many hours (isn't that something we've all done at some point?) and missed something stupid. The code that I now have, which works, is:
def _real_encrypt(self, source):
src_len = len(source)
cdef char *ciphertext = <char *>malloc(src_len)
cmc.strncpy(ciphertext, source, src_len)
cmc.mcrypt_generic_init(self._mcStream, <void *>self._key,
len(self._key), NULL)
cmc.mcrypt_generic(self._mcStream, <void *>ciphertext,
src_len)
retval = ciphertext[:src_len]
cmc.mcrypt_generic_deinit(self._mcStream)
return retval
It's probably not the most efficient code in the world, as it makes a copy to do the encryption and then a second copy to the return value. I'm not sure if it is possible to avoid that, though, since I'm not sure if it is possible to take a newly-allocated buffer and return it to Python in-place as a bytestring. But now that I have a working function, I'm going to implement a block-by-block method as well, so that one can provide an iterable of blocks for encryption or decryption, and be able to do it without having the entire source and destination all in memory all at once---that way, it'd be possible to encrypt/decrypt huge files without having to worry about holding up to three copies of it in memory at any one point...
Thanks for the help, everyone!