Python: Similar functionality in struct and array vs ctypes
Asked Answered
S

1

6

Python provides the following three modules that deal with C types and how to handle them:

  • struct for C structs
  • array for arrays such as those in C
  • ctypes for C functions, which necessarily entails dealing with C’s type system

While ctypes seems more general and flexible (its main task being “a foreign function library for Python”) than struct and array, there seems to be significant overlap in functionality between these three modules when the task is to read binary data structures. For example, if I wanted to read a C struct

struct MyStruct {
    int a;
    float b;
    char c[12];
};

I could use struct as follows:

a, b, c = struct.unpack('if12s', b'\x11\0\0\0\x12\x34\x56\x78hello world\0')
print(a, b, c)
# 17 1.7378244361449504e+34 b'hello world\x00'

On the other hand, using ctypes works equally well (although a bit more verbose):

 class MyStruct(ctypes.Structure):
     _fields_ = [
         ('a', ctypes.c_int),
         ('b', ctypes.c_float),
         ('c', ctypes.c_char * 12)
     ]
 s = MyStruct.from_buffer_copy(b'\x11\0\0\0\x12\x34\x56\x78hello world\0')
 print(s.a, s.b, s.c)
 # 17 1.7378244361449504e+34 b'hello world'

(Aside: I do wonder where the trailing '\0' went in this version, though…)

This seems to me like it violates the principles in “The Zen of Python”:

  1. There should be one—and preferably only one—obvious way to do it.

So how did this situation with several of these similar modules for binary data handling arise? Is there a historical or practical reason? (For example, I could imagine omitting the struct module entirely and simply adding a more convenient API for reading/writing C structs to ctypes.)

Scheer answered 24/8, 2018 at 12:15 Comment(6)
Although that way may not be obvious at first unless you're Dutch :-)Sacks
1. The \0 is still there but it's not printed (check NULL terminated strings). 2. They were probably developed in parallel (I remember that at 1st, ctypes was not in Python standard library). The functionality does partially overlap. Removing one such module would break backwards compatibility as there is probably lots of code out there that depends on each of the 3 modules. Note: ctypes uses struct.Admass
@Admass 1. It is printed in the struct version, though! Also: len(c) == 12, len(s.c) == 11. 2. Well, Python did a lot of backwards-incompatible changes in the transition to Python 3. As far as I know, these kinds of “warts”/redundancies are exactly what they wanted to get rid of with Python 3.Scheer
1. Yeas you are right, apparently when dealing with char arrays, the buffer parsing (for that field) stops when \0 is encounteredAdmass
@Admass Also: “ctypes uses struct.” Do you have a reference or details for that?Scheer
Check #1406413 (item #3.)Admass
H
11

Disclaimer: this post is speculation based on my understanding of the "division of labor" in Python stdlib, not on factual referenceable info.

Your question stems from the fact that "C structs" and "binary data" tend to be used interchangeably, which, while correct in practice, is wrong in a technical sense. The struct documentation is also misleading: it claims to work on "C structs", while a better description would be "binary data", with some disclaimers about C compatibility.

Fundamentally, struct, array and ctypes do different things. struct deals with converting Python values into binary in-memory formats. array deals with efficiently storing a lot of values. ctypes deals with the C language(*). The overlap in functionality stems from the fact that for C, the "binary in-memory formats" are native, and that "efficiently storing values" is packing them into a C-like array.

You will also note that struct lets you easily specify endianness, because it deals with packing and unpacking binary data in many different ways it can be packed; while in ctypes it is more difficult to get non-native byte order, because it uses the byte order that is native to C.

If your task is reading binary data structures, there's increasing levels of abstraction:

  1. Manually splitting the byte array and converting parts with int.from_bytes and the like
  2. Describing the data with a format string and using struct to unpack in one go
  3. Using a library like Construct to describe the structure declaratively in logical terms.

ctypes don't even figure here, because for this task, using ctypes is pretty much taking a round-trip through a different programming language. The fact that it works just as well for your example is incidental; it works because C is natively suited to expressing many ways of packing binary data. But if your struct was mixed-endian, for instance, it would be very difficult to express in ctypes. Another example is half-precision float which doesn't have a C equivalent (see here).

In this sense, it's also very reasonable that ctypes use struct - after all, "packing and unpacking binary data" is a subtask of "interfacing with C".

On the other hand, it would make no sense for struct to use ctypes: it would be like using the email library for character encoding conversions because it's a task that an e-mail library can do.

(*) well, basically. More precise would be something like "C-based environments", i.e., how modern computers work on low level due to co-evolution with C as the primary systems language.

Hautrhin answered 27/8, 2018 at 8:32 Comment(1)
This is what I was looking for, thanks! I think what got me was the terminology/focus on “C structs” in the struct documentation (first sentence: “This module performs conversions between Python values and C structs […]”, when really, as you said, it can handle more arbitrary binary data.Scheer

© 2022 - 2024 — McMap. All rights reserved.