Accessing bitfields while reading/writing binary data structures
Asked Answered
D

2

13

I'm writing a parser for a binary format. This binary format involves different tables which are again in binary format containing varying field sizes usually (somewhere between 50 - 100 of them).

Most of these structures will have bitfields and will look something like these when represented in C:

struct myHeader
{
  unsigned char fieldA : 3
  unsigned char fieldB : 2;
  unsigned char fieldC : 3;
  unsigned short fieldD : 14;
  unsigned char fieldE : 4
}

I came across the struct module but realized that its lowest resolution was a byte and not a bit, otherwise the module pretty much was the right fit for this work.

I know bitfields are supported using ctypes, but I'm not sure how to interface ctypes structs containing bitfields here.

My other option is to manipulate the bits myself and feed it into bytes and use it with the struct module - but since I have close to 50-100 different types of such structures, writing the code for that becomes more error-prone. I'm also worried about efficiency since this tool might be used to parse large gigabytes of binary data.

Thanks.

Daffodil answered 25/8, 2011 at 23:29 Comment(4)
there are also 3rd party bit array / bit manipulation libraries.Samhita
It would be a fair amount of work, but you could probably design a class that could parse C-style structure definitions (or something similar to them that eliminated packing ambiguity) into a set of masks for each bitfield, read the data in via the struct module to get to the byte level, and offer __getattr__ access.Spermatophyte
Yes I now came across these tools - python-bitstring, Construct, BitReader - and reading through their docs. Bit Reader seems like a viable solution but I see here that the performance is gonna be a big hit. Construct as far as I could find from their basic documentation doesnt support bit fields. Python-bitstring sounds promising and need to dig in bit deeperDaffodil
yes Russell that is my last alternative as of now - something like a higher level abstraction to support bitfields with the struct module.Daffodil
I
7

Using bitstring (which you mention you're looking at) it should be easy enough to implement. First to create some data to decode:

>>> myheader = "3, 2, 3, 14, 4"
>>> a = bitstring.pack(myheader, 1, 0, 5, 1000, 2)
>>> a.bin
'00100101000011111010000010'
>>> a.tobytes()
'%\x0f\xa0\x80'

And then decoding it again is just

>>> a.readlist(myheader)
[1, 0, 5, 1000, 2]

Your main concern might well be the speed. The library is well optimised Python, but that's not nearly as fast as a C library would be.

Ichthyo answered 26/8, 2011 at 10:2 Comment(3)
Thanks Scott - yes I've checked your bitstring library and it comes very close to my requirements indeed. In fact I posted the question in the mailing list here. I can understand it can be read as a list - but I'd like to preferably use a dictionary just for the convenience of code readability since the structs I'll be dealing with would have more than 20 or 30 fields easily. I know it is supported in pack, but would like to know how to use it with unpack since that will be the primary functionality.Daffodil
@Ash: You can't unpack to a dictionary just yet. I think you need something like the decode method proposed here, which hasn't been done partly because what I'd really like to return is an ordered dictionary - I'm not sure that an unordered dictionary would be that useful. I'll think about it some more though...Ichthyo
yes it makes sense to return an ordered dictionary but I guess it's support is present directly only in Python 3.3a0 (or at least based on what the page says here- PEP372Daffodil
T
6

I haven't rigorously tested this, but it seems to work with unsigned types (edit: it works with signed byte/short types, too).

Edit 2: This is really hit or miss. It depends on the way the library's compiler packed the bits into the struct, which is not standardized. For example, with gcc 4.5.3 it works as long as I don't use the attribute to pack the struct, i.e. __attribute__ ((__packed__)) (so instead of 6 bytes it gets packed into 4 bytes, which you can check with __alignof__ and sizeof). I can make it almost work by adding _pack_ = True to the ctypes Structure definition, but it fails for fieldE. gcc notes: "Offset of packed bit-field ‘fieldE’ has changed in GCC 4.4".

import ctypes

class MyHeader(ctypes.Structure):
    _fields_ = [
        ('fieldA', ctypes.c_ubyte, 3),
        ('fieldB', ctypes.c_ubyte, 2),
        ('fieldC', ctypes.c_ubyte, 3),
        ('fieldD', ctypes.c_ushort, 14),
        ('fieldE', ctypes.c_ubyte, 4),
    ]

lib = ctypes.cdll.LoadLibrary('C/bitfield.dll')

hdr = MyHeader()
lib.set_header(ctypes.byref(hdr))

for x in hdr._fields_:
    print("%s: %d" % (x[0], getattr(hdr, x[0])))

Output:

fieldA: 3
fieldB: 1
fieldC: 5
fieldD: 12345
fieldE: 9

C:

typedef struct _MyHeader {
    unsigned char  fieldA  :  3;
    unsigned char  fieldB  :  2;
    unsigned char  fieldC  :  3;
    unsigned short fieldD  : 14;
    unsigned char  fieldE  :  4;
} MyHeader, *pMyHeader; 

int set_header(pMyHeader hdr) {

    hdr->fieldA = 3;
    hdr->fieldB = 1;
    hdr->fieldC = 5;
    hdr->fieldD = 12345;
    hdr->fieldE = 9;

    return(0);
}
Triplet answered 26/8, 2011 at 3:22 Comment(3)
See a tested example without the need for any C code or dlls at all at Does Python have a bitfield type?Insoluble
@Insoluble - Your example represents a way to store such data within Python itself. But how do you import or export such data from/to a stream of bytes that can be read/written to disk or may be recvd/sent over network ?Daffodil
@ash That is what the union is for, and the flags.asbyte field in that example. Thanks for pointing out that it wasn't so clear. I've polished the text there to make it a bit more clear. Heh :)Insoluble

© 2022 - 2024 — McMap. All rights reserved.