Packing 4 Integers as ONE BYTE?

Asked 14/3, 2011 at 17:53 Answered 17/11, 2023 at 13:18

I have four integers {a, b, c, d} that can have the following range of values:

a - {0 or 1} (1 bit)

b - {0 or 1} (1 bit)

c - {0, 1, 2, ..., 7} (3 bits)

d - {0, 1, 2, ..., 7} (3 bits)

at first, I would like to pack them into a one byte that can be then written to a binary file. later, I would like to unpack that one byte and get from it to a tuple of the form (a, b, c, d).

I know how to read/write a byte to a binary file in python. But how do I do the bits packing/unpacking?

Derickderide answered 14/3, 2011 at 17:53 Comment(1)

If you need to do this a lot, check out construct.wikispaces.com and bitbucket.org/haypo/hachoir/wiki/Home which both pack and unpack according to a declarative format definition. – Bentlee 14/3, 2011 at 18:22

Use shift and bitwise OR, then convert to a character to get a "byte":

x = chr(a | (b << 1) | (c << 2) | (d << 5))

To unpack this byte again, first convert to an integer, then shift and use bitwise AND:

i = ord(x)
a = i & 1
b = (i >> 1) & 1
c = (i >> 2) & 7
d = (i >> 5) & 7

Explanation: Initially, you have

0000000a
0000000b
00000ccc
00000ddd

The left-shifts give you

0000000a
000000b0
000ccc00
ddd00000

The bitwise OR results in

dddcccba

Converting to a character will convert this to a single byte.

Unpacking: The four different right-shifts result in

dddcccba
0dddcccb
00dddccc
00000ddd

Masking (bitwise AND) with 1 (0b00000001) or 7 (0b00000111) results in

0000000a
0000000b
00000ccc
00000ddd

again.

Best answered 14/3, 2011 at 17:56 Comment(8)

Note that strings are only strings of bytes in Python 2; in Python 3 they're Unicode, so chr(x) will actually give you a 2- or 4-byte word. You'll want to construct an actual bytes object instead of a string. I'm not sure off-hand what the bytes equivalent of chr is, if there is one. – Longford 14/3, 2011 at 18:13

(I think the correct thing to do is generally as sjr does: simply return it as an int, and handle changing it to a single byte at the point you actually use it.) – Longford 14/3, 2011 at 18:14

I also favour ANDing before you do the bit shift in constructing the byte to ensure that the the data is of the expected width. Other wise you can get all sorts of nasty and subtle errors. – Mosley 14/3, 2011 at 19:48

@Glenn: regarding your first comment: Very good point. It would probably be best to use bytearray.append() for the conversion from integer to byte and and item access on a bytearray for the conversion back. This would work in Python 2.6 or newer. – Best 14/3, 2011 at 22:1

@Peter: You may as well just assert your requirements, eg. assert 0 <= a <= 1, a; if that's what you expect to receive, then it's better to throw an error if you get invalid data instead of silently returning something bogus. – Longford 15/3, 2011 at 2:44

@Glen Maynard .. point taken. I suppose its a question of philosophy. In general the stuff I work on has a greater penalty for stopping than producing erroneous data under a specific set of circumstances - so that has coloured my thinking towards not throwing an error. Although "correct data in" is still the better option! – Mosley 16/3, 2011 at 11:44

@GlennMaynard: Using assert for data validation is a bad idea -- if/when somebody runs your program with optimization turned on, the asserts (and your checks) are removed. – Conway 11/11, 2011 at 19:56

@Ethan: Checking for invalid parameters (programming errors) is exactly the sort of thing you want removed by that optimization. If you want a check that's intended to be thrown and caught in production, then you don't want AssertionError anyway; you probably want ValueError. – Longford 13/11, 2011 at 23:30

def encode(a, b, c, d):
  return a | b << 1 | c << 2 | d << 5

def decode(x):
  return x & 1, (x >> 1) & 1, (x >> 2) & 7, (x >> 5) & 7

Matless answered 14/3, 2011 at 18:2 Comment(2)

Note that the parentheses around multiple return values are typically omitted in Python: return x & 1, (x >> 1) & 1, (x >> 2) & 7, (x >> 5) & 7). – Longford 14/3, 2011 at 18:8

Note that you have a hanging closing paranthesis. ;-) – Theocritus 14/3, 2011 at 18:29

If you need to this kind of thing a lot then bit shifting can become tedious and error prone. There are third-party libraries that can help - I wrote one called bitstring:

To pack and convert to a byte:

x = bitstring.pack('2*uint:1, 2*uint:3', a, b, c, d).bytes

and to unpack:

a, b, c, d = bitstring.BitArray(bytes=x).unpack('2*uint:1, 2*uint:3')

This is probably overkill for your example, but it's helpful when things get more complicated.

Complexity answered 14/3, 2011 at 19:1 Comment(0)

Pretty simple. Mask (for range), shift them into place, and or them together.

packed = ((a & 1) << 7) | ((b & 1) << 6) | ((c & 7) << 3) | (d & 7)

a = (packed >> 7) & 1
b = (packed >> 6) & 1
c = (packed >> 3) & 7
d = packed & 7

Suit answered 14/3, 2011 at 17:59 Comment(0)

nested arithmetic form (NAF) :

p = 8 * (8 * (2 * a + b) + c) + d                  # "a" at MSB
p =               a + (b + (c + d * 8) * 2) * 2    # "a" at LSB

nested shifting form (NSF) :

p = ( ( ( ( (a << 1) + b) << 3) + c) << 3) + d     # "a" at MSB
p = ( ( ( ( (d << 3) + c) << 1) + b) << 1) + a     # "a" at LSB

Personally I think NAF is much cleaner (even if not the fastest), because it keeps all the digit groups on the same side and all the scaling factors on the other regardless of endian-ness, all while minimizing the layers of nesting required, and total arithmetic ops.

"NAF" and "NSF" aren't formal terms - it's just a colloquial way of describing them

So packing it would be chr(p) or just sprintf("%c", p)

as for decoding, using the "a" at MSB approach, it's basically splitting out a byte's octal codes back out to its components ("a" at LSB is the same decoding process in little-endian) :

     2a+b c  d   a  b
        \  \/     \/
======================
62      \0 76     00
63      \0 77     00
64      \1 00     01
65      \1 01     01

126     \1 76     01
127     \1 77     01
128     \2 00     10
129     \2 01     10

190     \2 76     10
191     \2 77     10
192     \3 00     11
193     \3 01     11

254     \3 76     11
255     \3 77     11
0       \0 00     00
1       \0 01     00

2a+b is really useful when it comes to encoding UTF-8 :

\0 ## - all the ASCII digits and arithmetic operators, most of the ASCII linguistic punctuation symbols, and nearly all the ASCII [[:cntrl:]] bytes (sans \177 DEL)

\1 ## - all the ASCII letters, plus misc [[:punct:]]

\2 ## - trailing/"continuation" bytes for Unicode code points U+0080 - U+10FFFFF

\3 ## - all the leading byte for Unicode code points U+0080 - U+10FFFFF

— sans \300 xC0 - \301 xC1 and \365 xF5 - \377 xFF

Bigeye answered 17/11, 2023 at 13:18 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags