ctypes return a string from c function
Asked Answered
V

5

17

I'm a Python veteran, but haven't dabbled much in C. After half a day of not finding anything on the internet that works for me, I thought I would ask here and get the help I need.

What I want to do is write a simple C function that accepts a string and returns a different string. I plan to bind this function in several languages (Java, Obj-C, Python, etc.) so I think it has to be pure C?

Here's what I have so far. Notice I get a segfault when trying to retrieve the value in Python.

hello.c

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

const char* hello(char* name) {
    static char greeting[100] = "Hello, ";
    strcat(greeting, name);
    strcat(greeting, "!\n");
    printf("%s\n", greeting);
    return greeting;
}

main.py

import ctypes
hello = ctypes.cdll.LoadLibrary('./hello.so')
name = "Frank"
c_name = ctypes.c_char_p(name)
foo = hello.hello(c_name)
print c_name.value # this comes back fine
print ctypes.c_char_p(foo).value # segfault

I've read that the segfault is caused by C releasing the memory that was initially allocated for the returned string. Maybe I'm just barking up the wrong tree?

What's the proper way to accomplish what I want?

Vasiliu answered 14/2, 2013 at 20:53 Comment(5)
You need to set foo.restype appropriately. Do you really want to use static? Not threadsafe. Wouldn't you be better allocating memory in Python and letting the C code populate it with content? Or allocate in the C code, and export a deallocator too.Crista
You should probably return a copy of the string; use strdup or malloc for that. But really, if you want to do this kind of things in C, then invest in a C book. C is quite different from higher-level languages such as Python.Unwished
Aside from the problem you describe, your buffer is static, so there's only one for all calls, so the next call would change what the first return value points at. Keeping it local and not static means its lifetime ends when the function returns, which makes it unsuitable. That's not even touching on the buffer overflow vulnerability!Genus
Heh, obviously a C noob here. :) If I remove static gcc gives me a warning. What's the proper way to allocate the memory for return? I'm just looking for something safe and straightforward.Vasiliu
There is little safe or straightforward in C ;-) At least not if you work with a Python mindset. Read a good C book. Reading existing questions and answers here on Stackoverflow works in a pinch but I wouldn't bet on it. (Btw, gcc gives a warning for the very reason I hinted at: It's incorrect, you're returning the address of something that doesn't exist any more.)Genus
V
6

In hello.c you return a local array. You have to return a pointer to an array, which has to be dynamically allocated using malloc.

char* hello(char* name)
{ 
    char hello[] = "Hello ";
    char excla[] = "!\n";
    char *greeting = malloc ( sizeof(char) * ( strlen(name) + strlen(hello) + strlen(excla) + 1 ) );
    if( greeting == NULL) exit(1);
    strcpy( greeting , hello);
    strcat(greeting, name);
    strcat(greeting, excla);
    return greeting;
}
Vespasian answered 14/2, 2013 at 20:58 Comment(12)
Very nice! Thank you. A followup question on this answer: Since we're allocating the memory here, where/when is it deallocated? If I call this function 10k times, will I have an awful leak?Vasiliu
Beat me to it. This should do the job (unless there is any other unforeseen problems). The reason it didn't work before was that the by creating the string like you did, it was allocated on the stack and was thus lost once the function exited. The solution instead uses malloc to allocate on the heap, and returns to Python the location where to find the string.Bun
@ThaneBrimhall Of course for every malloc you need to make a freeVespasian
I suppose I'd have to free it in the Python binding? Or where else would I do that?Vasiliu
I don't know python but according to the rules set by your question, you would have to make a c function and pass the char* pointer to it.Vespasian
Perfect, thank you! See this question for one way to do it.Vasiliu
The code should also check for malloc failing, and deal with it appropriately.Chinese
More importantly, it should check the length. malloc rarely fails and if it does you get a crash. On the other hand, it's very easy to cause a buffer overflow with this code (or OP's code, to be fair). This isn't the nineties.Genus
@delnan Of course it should do both things , but i wasn't writing the perfect function but an example how to do it in the first place.Vespasian
Still, this is such a glaring problem, and these bugs have such a horrible history, that I would rather see a huge warning (or just fix it, strlen should help). I actually disagree with the malloc check proposed above, and it triggered me to comment at all.Genus
Looks good to me, modulo stylistic issues that go beyond nitpicking. But I don't do this on a daily basis, so don't take it as a guarantee. (And again, I for one don't think the null check is worth the line of code it takes. But this is subjective.)Genus
This code produces memory leaks. Use python function create_string_buffer and fill the buffer in c code.Swipe
R
18

Your problem is that greeting was allocated on the stack, but the stack is destroyed when the function returns. You could allocate the memory dynamically:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

const char* hello(char* name) {
    char* greeting = malloc(100);
    snprintf("Hello, %s!\n", 100, name)
    printf("%s\n", greeting);
    return greeting;
}

But that's only part of the battle because now you have a memory leak. You could plug that with another ctypes call to free().

...or a much better approach is to read up on the official C binding to python (python 2.x at http://docs.python.org/2/c-api/ and python 3.x at http://docs.python.org/3/c-api/). Have your C function create a python string object and hand that back. It will be garbage collected by python automatically. Since you are writing the C side, you don't have to play the ctypes game.

...edit..

I didn't compile and test, but I think this .py would work:

import ctypes

# define the interface
hello = ctypes.cdll.LoadLibrary('./hello.so')
# find lib on linux or windows
libc = ctypes.CDLL(ctypes.util.find_library('c'))
# declare the functions we use
hello.hello.argtypes = (ctypes.c_char_p,)
hello.hello.restype = ctypes.c_char_p
libc.free.argtypes = (ctypes.c_void_p,)

# wrap hello to make sure the free is done
def hello(name):
    _result = hello.hello(name)
    result = _result.value
    libc.free(_result)
    return result

# do the deed
print hello("Frank")
Rexanne answered 14/2, 2013 at 21:8 Comment(6)
I can't do the "much better approach" you recommended (return a Python object) because I need to bind this function in multiple languages.Vasiliu
Okay, that can be a problem! Another option is SWIG, which can bind several languages (swig.org/compat.html#SupportedLanguages). I use ctypes from time to time, but it can be unwieldy when the interface is complex.Rexanne
I added the python code to the example - its not tested but looks right to me (lol).Rexanne
Any ideas how to find free on Windows? libc doesn't exist and util.find_library('c') returns None.Bewick
As a workaround I just defined my own C routine that accepts a char * and calls free. Then I call call free using my own code shared library code without needing to import libc or anything else.Bewick
Being static, greetings was not allocated on the stack but in a data area that belong to hello.soMana
V
6

In hello.c you return a local array. You have to return a pointer to an array, which has to be dynamically allocated using malloc.

char* hello(char* name)
{ 
    char hello[] = "Hello ";
    char excla[] = "!\n";
    char *greeting = malloc ( sizeof(char) * ( strlen(name) + strlen(hello) + strlen(excla) + 1 ) );
    if( greeting == NULL) exit(1);
    strcpy( greeting , hello);
    strcat(greeting, name);
    strcat(greeting, excla);
    return greeting;
}
Vespasian answered 14/2, 2013 at 20:58 Comment(12)
Very nice! Thank you. A followup question on this answer: Since we're allocating the memory here, where/when is it deallocated? If I call this function 10k times, will I have an awful leak?Vasiliu
Beat me to it. This should do the job (unless there is any other unforeseen problems). The reason it didn't work before was that the by creating the string like you did, it was allocated on the stack and was thus lost once the function exited. The solution instead uses malloc to allocate on the heap, and returns to Python the location where to find the string.Bun
@ThaneBrimhall Of course for every malloc you need to make a freeVespasian
I suppose I'd have to free it in the Python binding? Or where else would I do that?Vasiliu
I don't know python but according to the rules set by your question, you would have to make a c function and pass the char* pointer to it.Vespasian
Perfect, thank you! See this question for one way to do it.Vasiliu
The code should also check for malloc failing, and deal with it appropriately.Chinese
More importantly, it should check the length. malloc rarely fails and if it does you get a crash. On the other hand, it's very easy to cause a buffer overflow with this code (or OP's code, to be fair). This isn't the nineties.Genus
@delnan Of course it should do both things , but i wasn't writing the perfect function but an example how to do it in the first place.Vespasian
Still, this is such a glaring problem, and these bugs have such a horrible history, that I would rather see a huge warning (or just fix it, strlen should help). I actually disagree with the malloc check proposed above, and it triggered me to comment at all.Genus
Looks good to me, modulo stylistic issues that go beyond nitpicking. But I don't do this on a daily basis, so don't take it as a guarantee. (And again, I for one don't think the null check is worth the line of code it takes. But this is subjective.)Genus
This code produces memory leaks. Use python function create_string_buffer and fill the buffer in c code.Swipe
A
5

I ran into this same problem today and found you must override the default return type (int) by setting restype on the method. See Return types in the ctype doc here.

import ctypes
hello = ctypes.cdll.LoadLibrary('./hello.so')
name = "Frank"
c_name = ctypes.c_char_p(name)
hello.hello.restype = ctypes.c_char_p # override the default return type (int)
foo = hello.hello(c_name)
print c_name.value
print ctypes.c_char_p(foo).value
Assess answered 13/5, 2019 at 17:40 Comment(0)
A
1

I also ran into the same problem but used a different approach. I was suppose to find a string in a list of strings matchin a certain value.

Basically I initalized a char array with the size of longest string in my list. Then passed that as an argument to my function to hold the corresponding value.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void find_gline(char **ganal_lines, /*line array*/
                size_t size,        /*array size*/
                char *idnb,         /* id number for check */
                char *resline) {
  /*Iterates over lines and finds the one that contains idnb
    then affects the result to the resline*/
  for (size_t i = 0; i < size; i++) {
    char *line = ganal_lines[i];
    if (strstr(line, idnb) != NULL) {
      size_t llen = strlen(line);
      for (size_t k = 0; k < llen; k++) {
        resline[k] = line[k];
      }
      return;
    }
  }
  return;
}

This function was wrapped by the corresponding python function:



def find_gline_wrap(lines: list, arg: str, cdll):
    ""
    # set arg types
    mlen = maxlen(lines) # gives the length of the longest string in string list
    linelen = len(lines)
    line_array = ctypes.c_char_p * linelen

    cdll.find_gline.argtypes = [
        line_array,
        ctypes.c_size_t,
        ctypes.c_char_p,
        ctypes.c_char_p,
    ]
    #
    argbyte = bytes(arg, "utf-8")

    resbyte = bytes("", "utf-8")

    ganal_lines = line_array(*lines)
    size = ctypes.c_size_t(linelen)
    idnb = ctypes.c_char_p(argbyte)
    resline = ctypes.c_char_p(resbyte * mlen)
    pdb.set_trace()
    result = cdll.find_gline(ganal_lines, size, idnb, resline)
    # getting rid of null char at the end
    result = resline.value[:-1].decode("utf-8")
    return result
Anchorage answered 3/1, 2020 at 3:40 Comment(0)
B
0

Here's what happens. And why it's breaking. When hello() is called, the C stack pointer is moved up, making room for any memory needed by your function. Along with some function call overhead, all of your function locals are managed there. So that static char greeting[100], means that 100 bytes of the increased stack are for that string. You than use some functions that manipulate that memory. At the you place a pointer on the stack to the greeting memory. And then you return from the call, at which point, the stack pointer is retracted back to it's original before call position. So those 100 bytes that were on the stack for the duration of your call, are essentially up for grabs again as the stack is further manipulated. Including the address field which pointed to that value and that you returned. At that point, who knows what happens to it, but it's likely set to zero or some other value. And when you try to access it as if it were still viable memory, you get a segfault.

To get around, you need to manage that memory differently somehow. You can have your function allocate the memory on the heap, but you'll need to make sure it gets free()'ed at a later date, by your binding. OR, you can write your function so that the binding language passes it a glump of memory to be used.

Brahmaputra answered 14/2, 2013 at 21:0 Comment(3)
Excellent explanation of how it all works. How would I deallocate the memory once I use the result in my binding?Vasiliu
free(pointer). That's the opposite of malloc and friends. You'll have to provide a binding for that, or hope that you have one already, most languages that bind to C have some mechanism for doing that.Brahmaputra
static variables are not allocated on the stack, and remain after the function returns, so this is wrong.Arteritis

© 2022 - 2024 — McMap. All rights reserved.