Short answer: cmd
likely contains a Unicode string, which cannot be trivially converted to a const char *
. The error message might come from a wrapper framework that automates writing Python bindings for C libraries, such as SWIG or ctypes. The framework knows what to do with a byte string, but punts on Unicode strings. Passing str(cmd)
helps because it converts the Unicode string to a byte string, from which a const char *
value expected by C code can be trivially extracted.
Long answer:
The C type char const *
, more customarily spelled const char *
, can be read as "read-only array of char
", char
being C's way to spell "byte". When a C function accepts a const char *
, it expects a "C string", i.e. an array of char
values terminated with a null character. Conveniently, Python strings are internally represented as C strings with some additional information such as type, reference count, and the length of the string (so the string length can be retrieved with O(1) complexity, and also so that the string may contain null characters themselves).
Unicode strings in Python 2 are represented as arrays of Py_UNICODE
, which are either 16 or 32 bits wide, depending on the operating system and build-time flags. Such an array cannot be passed to code that expects an array of 8-bit chars — it needs to be converted, typically to a temporary buffer, and this buffer must be freed when no longer needed.
For example, a simple-minded (and quite unnecessary) wrapper for the C function strlen
could look like this:
PyObject *strlen(PyObject *ignore, PyObject *obj)
{
if (!PyString_Check(obj)) {
PyErr_Format(PyExc_TypeError, "string expected, got %s", Py_TYPE(obj)->tp_name);
return NULL;
}
const char *c_string = PyString_AsString(obj);
size_t len = strlen(c_string);
return PyInt_FromLong((long) len);
}
The code simply calls PyString_AsString
to retrieve the internal C string stored by every Python string and expected by strlen
. For this code to also support Unicode objects (provided it even makes sense to call strlen
on Unicode objects), it must handle them explicitly:
PyObject *strlen(PyObject *ignore, PyObject *obj)
{
const char *c_string;
PyObject *tmp = NULL;
if (PyString_Check(obj))
c_string = PyString_AsString(obj);
else if (PyUnicode_Check(obj)) {
if (!(tmp = PyUnicode_AsUTF8String(obj)))
return NULL;
c_string = PyString_AsString(tmp);
}
else {
PyErr_Format(PyExc_TypeError, "string or unicode expected, got %s",
Py_TYPE(obj)->tp_name);
return NULL;
}
size_t len = strlen(c_string);
Py_XDECREF(tmp);
return PyInt_FromLong((long) len);
}
Note the additional complexity, not only in lines of boilerplate code, but in the different code paths that require different management of a temporary object that holds the byte representation of the Unicode string. Also note that the code needed to decide to on an encoding when converting a Unicode string to a byte string. UTF-8 is guaranteed to be able to encode any Unicode string, but passing a UTF-8-encoded sequence to a function expecting a C string might not make sense for some uses. The str
function uses the ASCII codec to encode the Unicode string, so if the Unicode string actually contained any non-ASCII characters, you would get an exception.
There have been requests to include this functionality in SWIG, but it is unclear from the linked report if they made it in.
active_call
? If it is unicode, thencmd
will be a Unicode string, andstr(cmd)
will convert it to string. You can insert animport pdb; pdb.set_trace()
line before thecon.api
call, and inspectcmd
. – Manaraactive_call
is most probably unicode, indeed (comes out of a Django database), but there is nothing in this error message that makes me think about unicode: do you think it might be related ? – Centuplechar const *
(by callingPyString_AsString
on the object). Python 2 Unicode strings do not consist of C chars — they consist ofwchar_t
, which means they cannot be trivially "cast" toconst char *
, they need to be converted into a new buffer, which must be allocated, etc. By pre-converting the Unicode string to string in Python, you perform the hard part before the object ever reaches C. – Manarachar const *
, aC char
or awchar_t
, but imo your comment definitely deserves to be an answer. Please feel free to provide more tech details if you are in the mood ;) (Also I'll add details to my question) – Centuple