How to pack arbitrary bit sequence in Python?
Asked Answered
C

4

7

I want to encode/compress some binary image data as a sequence if bits. (This sequence will, in general, have a length that does not fit neatly in a whole number of standard integer types.)

How can I do this without wasting space? (I realize that, unless the sequence of bits has a "nice" length, there will always have to be a small amount [< 1 byte] of leftover space at the very end.)

FWIW, I estimate that, at most, 3 bits will be needed per symbol that I want to encode. Does Python have any built-in tools for this kind of work?

Cynthy answered 21/2, 2011 at 12:45 Comment(0)
V
8

There's nothing very convenient built in but there are third-party modules such as bitstring and bitarray which are designed for this.

from bitstring import BitArray
s = BitArray('0b11011')
s += '0b100'
s += 'uint:5=9'
s += [0, 1, 1, 0, 1]
...
s.tobytes()

To join together a sequence of 3-bit numbers (i.e. range 0->7) you could use

>>> symbols = [0, 4, 5, 3, 1, 1, 7, 6, 5, 2, 6, 2]
>>> BitArray().join(BitArray(uint=x, length=3) for x in symbols)
BitArray('0x12b27eab2')
>>> _.tobytes()
'\x12\xb2~\xab '

Some related questions:

Vantassel answered 21/2, 2011 at 12:51 Comment(0)
T
3

have you tried simply compressing the whole sequence with bz2? If the sequence is long you should use the bz2.BZ2Compressor to allow chunked processing, otherwise use bz2.compress on the whole thing. The compression will probably not be ideal but will typically get very close when dealing with sparse data.

hope that helps.

Thomson answered 21/2, 2011 at 12:59 Comment(1)
I think he wants to write his own Huffman coding or something like that.Borough
C
3

Since you have a mapping from symbols to 3-bit string, bitarray does a nice job of encoding and decoding lists of symbols to and from arrays of bits:

from bitarray import bitarray
from random import choice

symbols = {
    '0' : bitarray('000'),
    'a' : bitarray('001'),
    'b' : bitarray('010'),
    'c' : bitarray('011'),
    'd' : bitarray('100'),
    'e' : bitarray('101'),
    'f' : bitarray('110'),
    'g' : bitarray('111'),
}

seedstring = ''.join(choice(symbols.keys()) for _ in range(40))

# construct bitarray using symbol->bitarray mapping
ba = bitarray()
ba.encode(symbols, seedstring)

print seedstring
print ba

# what does bitarray look like internally?
ba_string = ba.tostring()
print repr(ba_string)
print len(ba_string)

Prints:

egb0dbebccde0gfdfbc0d0ccfcg0acgg0ccfga00
bitarray('10111101000010001010101001101110010100... etc.
'\xbd\x08\xaanQ\xf4\xc9\x88\x1b\xcf\x82\xff\r\xee@'
15

You can see that this 40-symbol list (120 bits) gets encoded into a 15-byte bitarray.

Cirilla answered 21/2, 2011 at 14:25 Comment(0)
F
0

Note sure if this solves your problem exactly, but this will convert of list of integers to a packed byte string. Each integer in the list should only consume the specified number of bits:

from typing import List

def pack_bytes(data:List[int], bit_width:int) -> bytes:
    """Convert list if integers to packed byte string.
    Each integer should only consume the specified number of bits
    """

    packed_bytes = bytearray()
    buffer = 0
    bits_buffered = 0
    for sample in data:
        bit_mask = 0x01
        for i in range(bit_width):
            bit = (sample & bit_mask) >> i
            bit_mask <<= 1
            buffer |= (bit << bits_buffered)
            bits_buffered += 1

            if bits_buffered == 8:
                packed_bytes.append(buffer)
                buffer = 0
                bits_buffered = 0
    if bits_buffered != 0:
        packed_bytes.append(buffer)

    return packed_bytes

Example:

data = [0, 1, 2, 3, 4, 5, 6]

packed_data = pack_bytes(data, bit_width=4)
print(' '.join(f'0x{x:02X}' for x in packed_data))
# 0x10 0x32 0x54 0x06
Facesaving answered 8/8, 2023 at 22:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.