How to create a byte out of 8 bool values (and vice versa)?

Asked 11/12, 2011 at 0:53 Answered 1/12, 2023 at 4:40

Solved c++boolean bit-manipulation bit-packing

I have 8 bool variables, and I want to "merge" them into a byte.

Is there an easy/preferred method to do this?

How about the other way around, decoding a byte into 8 separate boolean values?

I come in assuming it's not an unreasonable question, but since I couldn't find relevant documentation via Google, it's probably another one of those "nonono all your intuition is wrong" cases.

Rotl answered 11/12, 2011 at 0:53 Comment(6)

I am not sure what you mean exactly. A bool(ean) datatype in c++ is one byte, how do you want to convert a byte into byte ? – Plauen 11/12, 2011 at 0:56

There is no way to pack 8 bool variables into one byte. There is a way packing 8 logical true/false states in a single byte using Bitmasking. en.wikipedia.org/wiki/Bitwise_operation – Plauen 11/12, 2011 at 0:58

@Plauen That should be an answer. – Bujumbura 11/12, 2011 at 0:59

@Bujumbura I was not sure what exactly the question was. – Plauen 11/12, 2011 at 1:1

I just knew people were going to make this question harder than it is. Well, it's my fault for not knowing booleans are more than 1 bit in size. – Rotl 11/12, 2011 at 1:33

Related: How to efficiently convert an 8-bit bitmap to array of 0/1 integers with x86 SIMD is nice for 16 bytes at once. – Butterbur 21/4, 2021 at 20:42

The hard way:

unsigned char ToByte(bool b[8])
{
    unsigned char c = 0;
    for (int i=0; i < 8; ++i)
        if (b[i])
            c |= 1 << i;
    return c;
}

And:

void FromByte(unsigned char c, bool b[8])
{
    for (int i=0; i < 8; ++i)
        b[i] = (c & (1<<i)) != 0;
}

Or the cool way:

struct Bits
{
    unsigned b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};
union CBits
{
    Bits bits;
    unsigned char byte;
};

Then you can assign to one member of the union and read from another. But note that the order of the bits in Bits is implementation defined.

Note that reading one union member after writing another is well-defined in ISO C99, and as an extension in several major C++ implementations (including MSVC and GNU-compatible C++ compilers), but is Undefined Behaviour in ISO C++. memcpy or C++20 std::bit_cast are the safe ways to type-pun in portable C++.

(Also, the bit-order of bitfields within a char is implementation defined, as is possible padding between bitfield members.)

Despoliation answered 11/12, 2011 at 1:1 Comment(12)

Can you loop on the Bits though? – Camm 24/9, 2015 at 23:9

@Camm Not with the union directly, if you need to loop you use <<. But you can mix both solitions. – Despoliation 24/9, 2015 at 23:13

The Bits struct can use unsigned char (not unsigned) to better match the CBits union. – Variate 17/6, 2016 at 7:32

@sp2denny I believe no because this is unsigned char which means there's no trap representation. – Tails 1/1, 2018 at 1:32

@ibug I think type-punning through the union is undefined behavior in C++, regehr seems to say so here: blog.regehr.org/archives/959 I don't think the question of trap-representations is relevant. It's about the strict aliasing rule. The "cool" way shown above would be against coding standards at my company at least. – Arriaga 1/1, 2018 at 1:46

@Tails it's UB to access a member of a union that wasn't last assigned to. – Morie 25/6, 2018 at 14:41

Type punning through unions is UB; please remove that or explicitly state that this is an extension and which compilers provide it. – Polston 25/6, 2018 at 14:42

This is definitively UB. – Copperhead 25/6, 2018 at 14:42

@Morie How did this question get some attention again? It's been half a year since my comment! – Tails 25/6, 2018 at 15:16

@ibug - it was marked as the duplicate for another question- so naturally everyone jumped to take a look see what they thought of it – Morie 25/6, 2018 at 15:18

doesnt FromByte create the bit array in reverse order? – Tuchman 7/3, 2022 at 20:13

@BaummitAugen Though in plain C it's not. I am aware that the question is tagged C++ but still worth noting. – Stevestevedore 16/1 at 17:37

The cool way (using the multiplication technique)

inline uint8_t pack8bools(bool* a)
{
    uint64_t t;
    memcpy(&t, a, sizeof t);         //  strict-aliasing & alignment safe load
    return 0x8040201008040201ULL*t >> 56;
       // bit order: a[0]<<7 | a[1]<<6 | ... | a[7]<<0  on little-endian
       // for a[0] => LSB, use 0x0102040810204080ULL    on little-endian
}

void unpack8bools(uint8_t b, bool* a)
{
       // on little-endian,  a[0] = (b>>7) & 1  like printing order
    auto MAGIC = 0x8040201008040201ULL;  // for opposite order, byte-reverse this
    auto MASK  = 0x8080808080808080ULL;
    uint64_t t = ((MAGIC*b) & MASK) >> 7;
    memcpy(a, &t, sizeof t);    // store 8 bytes without UB
}

Assuming sizeof(bool) == 1

To portably do LSB <-> a[0] (like the pext/pdep version below) instead of using the opposite of host endianness, use htole64(0x0102040810204080ULL) as the magic multiplier in both versions. (htole64 is from BSD / GNU <endian.h>). That arranges the multiplier bytes to match little-endian order for the bool array. htobe64 with the same constant gives the other order, MSB-first like you'd use for printing a number in base 2.

You may want to make sure that the bool array is 8-byte aligned (alignas(8)) for performance, and that the compiler knows this. memcpy is always safe for any alignment, but on ISAs that require alignment, a compiler can only inline memcpy as a single load or store instruction if it knows the pointer is sufficiently aligned. *(uint64_t*)a would promise alignment, but also violate the strict-aliasing rule. Even on ISAs that allow unaligned loads, they can be faster when naturally aligned. But the compiler can still inline memcpy without seeing that guarantee at compile time.

How they work

Suppose we have 8 bools b[0] to b[7] whose least significant bits are named a-h respectively that we want to pack into a single byte. Treating those 8 consecutive bools as one 64-bit word and load them we'll get the bits in reversed order in a little-endian machine. Now we'll do a multiplication (here dots are zero bits)

  |  b7  ||  b6  ||  b4  ||  b4  ||  b3  ||  b2  ||  b1  ||  b0  |
  .......h.......g.......f.......e.......d.......c.......b.......a
× 1000000001000000001000000001000000001000000001000000001000000001
  ────────────────────────────────────────────────────────────────
  ↑......h.↑.....g..↑....f...↑...e....↑..d.....↑.c......↑b.......a
  ↑.....g..↑....f...↑...e....↑..d.....↑.c......↑b.......a
  ↑....f...↑...e....↑..d.....↑.c......↑b.......a
+ ↑...e....↑..d.....↑.c......↑b.......a
  ↑..d.....↑.c......↑b.......a
  ↑.c......↑b.......a
  ↑b.......a
  a       
  ────────────────────────────────────────────────────────────────
= abcdefghxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The arrows are added so it's easier to see the position of the set bits in the magic number. At this point 8 least significant bits has been put in the top byte, we'll just need to mask the remaining bits out

So the magic number for packing would be 0b1000000001000000001000000001000000001000000001000000001000000001 or 0x8040201008040201. If you're on a big endian machine you'll need to use the magic number 0x0102040810204080 which is calculated in a similar manner

For unpacking we can do a similar multiplication

  |  b7  ||  b6  ||  b4  ||  b4  ||  b3  ||  b2  ||  b1  ||  b0  |
                                                          abcdefgh
× 1000000001000000001000000001000000001000000001000000001000000001
  ────────────────────────────────────────────────────────────────
= h0abcdefgh0abcdefgh0abcdefgh0abcdefgh0abcdefgh0abcdefgh0abcdefgh
& 1000000010000000100000001000000010000000100000001000000010000000
  ────────────────────────────────────────────────────────────────
= h0000000g0000000f0000000e0000000d0000000c0000000b0000000a0000000

After multiplying we have the needed bits at the most significant positions, so we need to mask out irrelevant bits and shift the remaining ones to the least significant positions. The output will be the bytes contain a to h in little endian.

The efficient way

On newer x86 CPUs with BMI2 there are PEXT and PDEP instructions for this purpose. The pack8bools function above can be replaced with

_pext_u64(*((uint64_t*)a), 0x0101010101010101ULL);

And the unpack8bools function can be implemented as

_pdep_u64(b, 0x0101010101010101ULL);

(This maps LSB -> LSB, like a 0x0102040810204080ULL multiplier constant, opposite of 0x8040201008040201ULL. x86 is little-endian: a[0] = (b>>0) & 1; after memcpy.)

Unfortunately those instructions are very slow on AMD before Zen 3 so you may need to compare with the multiplication method above to see which is better

The other fast way is SSE2

x86 SIMD has an operation that takes the high bit of every byte (or float or double) in a vector register, and gives it to you as an integer. The instruction for bytes is pmovmskb. This can of course do 16 bytes at a time with the same number of instructions, so it gets better than the multiply trick if you have lots of this to do.

#include <immintrin.h>

inline uint8_t pack8bools_SSE2(const bool* a)
{
    __m128i v = _mm_loadl_epi64( (const __m128i*)a );  // 8-byte load, despite the pointer type.
    // __m128 v = _mm_cvtsi64_si128( uint64 );  // alternative if you already have an 8-byte integer
    v = _mm_slli_epi32(v, 7);   // low bit of each byte becomes the highest
    return _mm_movemask_epi8(v);
}

There isn't a single instruction to unpack until AVX-512, which has mask-to-vector instructions. It is doable with SIMD, but likely not as efficiently as the multiply trick. See Convert 16 bits mask to 16 bytes mask and more generally is there an inverse instruction to the movemask instruction in intel avx2? for unpacking bitmaps to other element sizes.

How to efficiently convert an 8-bit bitmap to array of 0/1 integers with x86 SIMD has some answers specifically for 8-bits -> 8-bytes, but if you can't do 16 bits at a time for that direction, the multiply trick is probably better, and pext certainly is (except on CPUs where it's disastrously slow, like AMD before Zen 3).

Chavarria answered 8/8, 2018 at 15:53 Comment(21)

I upvoted this because it's cute and well-illustrated, but most people who come across this question would be better off with a std::bitset. They probably even get the optimization for free if they use the unsigned long long constructor. – Spurtle 11/6, 2019 at 8:23

@BaryonsforBreakfast no, you won't get the optimization for free because each bit in the bitset is still stored as one bit and not an 8-bit byte. You do have the ability to get each bit and do operations on it though, but not in parallel so it'll be slower. The 8 bytes -> 8 bits direction also doesn't work automatically with bitset – Chavarria 11/6, 2019 at 8:34

Sorry, I meant that the packing and unpacking CPU instructions under "the efficient way" are probably free when they're available if you use the unsigned long long constructor (and to_ullong to unpack). Going between the array of bools and the unsigned long longs would still not be free in that you'd still have to do the casts by hand. – Spurtle 11/6, 2019 at 9:9

to_ullong and the unsigned long long constructor can't use PEXT and PDEP because they just copy the unsigned long long value directly to/from the internal array. The bits in bitset are stored as bits in the array and not one bit per byte. So to_ullong gives you 64 bits and not 8 – Chavarria 11/6, 2019 at 10:31

Has anyone run into incorrect answers with gcc 8 and 9 when -O2 is enabled? I get wrong answers after a packing and unpacking when -O2 is used, while it gives correct answers without optimization. Note, using clang, it gives correct answers even when -O2 is on. – Revolver 11/3, 2020 at 6:7

Add another comment to address my previous comment regarding incorrect answers with gcc when -O2 is turned on: I found that using std::memcpy() instead of type punning would resolve the incorrect answer issue. – Revolver 11/3, 2020 at 6:19

@SamuelLi that's probably due to strict aliasing. Try -fno-strict-aliasing or a union. Or just change from bool to char because char* can alias other types – Chavarria 11/3, 2020 at 16:20

@Chavarria using char instead of bool would give me correct answers. But gcc still complains dereferencing type-punned pointer will break strict-aliasing rules. This is interesting behavior of gcc though. – Revolver 11/3, 2020 at 19:26

@SamuelLi: That's because using uint64_t* to read a char[] object violates strict-aliasing. ISO C only allows it the other way around: using char * to read an object that was originally a uint64_t. If the only access to the char object is through a char*, you're fine, but if there's any use of an automatic or static storage char[], you have UB. This answer should really use memcpy or C++20 std::bitcast, or GNU C typedef uint64_t unaligned_aliasing_u64 __attribute__((aligned(1), may_alias)); to do the type-punning; pointer-casting is never safe. – Butterbur 29/10, 2020 at 3:24

@PeterCordes Thanks for your explanation, but I don't quite get what you said about automatic or static storage. Could you elaborate what it means to be automatic or static storage? – Revolver 1/11, 2020 at 4:43

@SamuelLi: In C++, automatic storage = non-static locals; static storage = globals and static-anything. My point was that if there's some way to access the array object itself, especially as part of a struct, then it's possible to have a real problem with strict-aliasing. If the "array" is just how you're treating some memory you got from new or malloc, all the char accesses to it are going to be via char* anyway, and thus aliasing-safe. But if you have a real char array[8] variable declaration somewhere, access to it might not be through a char*. – Butterbur 1/11, 2020 at 4:50

@SamuelLi: Although I just realized you can have a problem even with dynamic storage. If you have struct foo{ int a; char b[8];}; then you can do struct assignment of the whole struct even via pointers to it, like foo *p=..., *q=...; and *p = *q. Access to char b[8] is not through a char* in that case, so a compiler could certainly assume that some uint64_t* load or store was unrelated to it. (You might actually need this struct thing for UB to actually happen, or at least to be a problem in practice, not just a char arr[8] local var: [] access works by decay to char*) – Butterbur 1/11, 2020 at 4:55

@PeterCordes I got it, thanks! I'm convinced that memcpy is the safest way to go! – Revolver 1/11, 2020 at 5:37

If you want the low bit of the mask to map to the least significant byte of the uint64_t (the low byte of the bool array on little-endian x86) you need MAGIC = 0x0102040810204080ULL, opposite what you're showing. See godbolt.org/z/j1nGM4TPY which compares test cases for that vs. pdep vs. How to efficiently convert an 8-bit bitmap to array of 0/1 integers with x86 SIMD. (For my answer on Convert 16 bits mask to 16 bytes mask) – Butterbur 21/4, 2021 at 23:18

(The order you have here will give you printing order, MSB at lowest address, on a little-endian machine. Opposite of your pdep version. I found the easiest way to think through which order is which is: 0x...40201 * 0x80 has 0x80 in the low byte, which the mask+shift will turn into 0x...01. So high bit of the input maps to low byte of the output, and memcpy into bool[] uses native endianness.) – Butterbur 14/5, 2021 at 15:28

godbolt.org/z/ba8PK99d1 is a test-case for byte-order, giving it bool a[] = {[7] = 1;} highest bit set as a test input -> 1 or 0x80. I made an edit based on sorting all that out, hopefully making this more usable for future readers. Note that clang on Godbolt uses host headers, so big-endian targets like powerpc64 still use x86 bits/endianness.h and have backwards htole64 / be64. (github.com/compiler-explorer/compiler-explorer/issues/…). But GCC works, and does sometimes constant-propagate through a bool array init -> memcpy -> multiply. – Butterbur 14/5, 2021 at 21:0

The magic number for the LSB->LSB mapping doesn't seem to be correct. The mask 0x80402010 (or a mask 0x01010101...) can work to spread the original byte apart, because the separation of each bit in the mask is at least 8. But in the other multiplier 0x010204....80ull all the bits have a distance of 7 meaning that the input octets overlap. Out of the 256 combinations 64 fail. One cure is to (a & 0x7f) * magic | (a << 56). – Juglandaceous 27/10, 2021 at 10:10

@AkiSuihkonen the magic number is correct, you can check the math I posted above. I've also done a correctness check previously and it works for all 256 combinations. You can also see my edit to see an old implementation with overlap and I had to split out a bit. After I changed to the latest implementation there's zero overlap – Chavarria 28/10, 2021 at 2:18

@phuclv: See godbolt.org/z/eMfaso3n7, the source copied from the snipped by Peter Cordes. The problem appears in the LSB->LSB conversion, iff both the LSB and MSB of a byte are set. And thus the magic number can't be simply reversed for big endian architectures. I'm quite sure, the code works for LSB->MSB conversion. – Juglandaceous 28/10, 2021 at 7:16

@AkiSuihkonen the magic number 0x0102040810204080ULL is added by Peter Cordes. My number works for LSB->MSB conversion and you have to reverse endian to make it work for the same direction. I'll edit this when I have time – Chavarria 28/10, 2021 at 15:49

When byte-reversing the mask in unpack8bools I cannot get the expected output when b is 255, i.e. every bit is set. ((MAGIC*b) & MASK) >> 7 ends up producing 0x1 instead of 0x101010101010101 – Vinasse 28/11, 2023 at 4:4

You might want to look into std::bitset. It allows you to compactly store booleans as bits, with all of the operators you would expect.

No point fooling around with bit-flipping and whatnot when you can abstract away.

Kwang answered 11/12, 2011 at 1:0 Comment(1)

See also cppreference. – Mercurial 3/2, 2017 at 11:18

#include <stdint.h>   // to get the uint8_t type

uint8_t GetByteFromBools(const bool eightBools[8])
{
   uint8_t ret = 0;
   for (int i=0; i<8; i++) if (eightBools[i] == true) ret |= (1<<i);
   return ret;
}

void DecodeByteIntoEightBools(uint8_t theByte, bool eightBools[8])
{
   for (int i=0; i<8; i++) eightBools[i] = ((theByte & (1<<i)) != 0);
}

Faiyum answered 11/12, 2011 at 1:0 Comment(6)

Posting a code solution without any explanation might help OP, but does not provide good value to other users. You should consider adding comments and/or explain what you did. – Bujumbura 11/12, 2011 at 1:2

+1 for using uint8_t. Exactly what the type was meant for, when you need exactly 8 bits. – Kwang 11/12, 2011 at 1:2

I hope you realize that eightBools[i] is a bool and checking it with == true you can also just write (eightBools[i] == true) == true or ((eightBools[i] == true) == true) == true, but where to stop? And yes, that's worth not up-voting the answer. – Glissando 11/12, 2011 at 1:6

weltraumpirat I think the code is straightforward enough that a separate explanation would be redundant and only get in the way. Christian I accept your criticism -- I wouldn't write it like that in my own code -- but I wrote it that way here to make the intent of the code clearer. You may keep your up-vote, I don't want it ;) – Faiyum 11/12, 2011 at 1:16

@JeremyFriesner You can have it anyway, since you say it's not your practice. Sorry for the harsh words ;) – Glissando 11/12, 2011 at 1:19

@JeremyFriesner For someone who is new to bitmasks, your code isn't straightforward, not one bit (pun intended). I stand by my earlier comment. – Bujumbura 11/12, 2011 at 10:39

bool a,b,c,d,e,f,g,h;
//do stuff
char y= a<<7 | b<<6 | c<<5 | d<<4 | e <<3 | f<<2 | g<<1 | h;//merge

although you are probably better off using a bitset

http://www.cplusplus.com/reference/stl/bitset/bitset/

Villalpando answered 11/12, 2011 at 1:0 Comment(2)

Isn't that too much hardcoding? – Hawser 11/12, 2011 at 1:3

Depends, I wouldn't write it, but if you want to merge 8 separate non contiguous bools then it is the way to do it. – Villalpando 11/12, 2011 at 1:6

I'd like to note that type punning through unions is UB in C++ (as rodrigo does in his answer. The safest way to do that is memcpy()

struct Bits
{
    unsigned b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};

unsigned char toByte(Bits b){
    unsigned char ret;
    memcpy(&ret, &b, 1);
    return ret;
}

As others have said, the compiler is smart enough to optimize out memcpy().

BTW, this is the way that Boost does type punning.

Tails answered 1/1, 2018 at 1:51 Comment(2)

@Michi Is this question C++? – Tails 7/1, 2019 at 16:34

Though I like the simplicity of this solution, but it looks like the memory layout of bit fields are compiler dependent link. That is undesired for some use cases... – Revolver 5/3, 2020 at 5:31

There is no way to pack 8 bool variables into one byte. There is a way packing 8 logical true/false states in a single byte using Bitmasking.

Plauen answered 11/12, 2011 at 1:0 Comment(2)

You can pack the values of 8 bools into one byte. Of course they're no longer separate C++ bool objects, but all 8 values are all there within the bits of one char or uint8_t. This answer seems unhelpful and pedantic when the desired behaviour was clear. – Butterbur 29/10, 2020 at 3:34

I couldn't read minds as to determine what the intent was, given my crystal ball was broken at the time of answering. I answered the question asked. – Plauen 4/11, 2020 at 0:23

You would use the bitwise shift operation and casting to archive it. a function could work like this:

unsigned char toByte(bool *bools)
{
    unsigned char byte = \0;
    for(int i = 0; i < 8; ++i) byte |= ((unsigned char) bools[i]) << i;
    return byte;
}

Thanks Christian Rau for the correction s!

Sleight answered 11/12, 2011 at 0:58 Comment(10)

I (pst) don't know any C++ ... so if someone could clarify why this question was downvoted, much appreciated! – Crannog 11/12, 2011 at 1:0

Can the people downvoting actually tell me what I am doing wrong? I am naive to programming. :/ – Hawser 11/12, 2011 at 1:0

And the reason you use a short (which may be 1 byte, but will most probably be 2) and not just a char (which is guaranteed to be 1 byte) is...? And also you should use unsigned types and initialize byte properly. Fix those and the answer is much more likely to be correct. But I'm not the down-voter, not yet. – Glissando 11/12, 2011 at 1:1

Oh true, I am making the edit now, thanks for pointing it out. – Hawser 11/12, 2011 at 1:3

Sorry, i down-voted for multiple misleading mistakes in your answer, which were legendarily explained by Christian Rau, I shall comment next time. – Plauen 11/12, 2011 at 1:7

@fabianhjr Not enough, where is the unsigned? And what is that \ for? – Glissando 11/12, 2011 at 1:8

@fabianhjr Also please do change the name of your function. – Plauen 11/12, 2011 at 1:9

@fabianhjr Is there any particular reason to use c cast (unsigned char) here and not cpp ? (just looking at tag) – Plauen 11/12, 2011 at 1:13

Well, it surely will work on both C and C++. Afterall C++ is an extension of C. – Hawser 11/12, 2011 at 1:16

unsigned char byte = \0; is invalid. Either use 0 or '\0' – Chavarria 20/8, 2018 at 9:40

@Fabian @111111 -

If you want 8 split out bools with less hard-coding, maybe...

bool a,b,c,d,e,f,g,h;      # assuming you want [a] at MSB and 
                           #                   [h] at LSB

char y = a << 7 | b<< 6 | c << 5 | d << 4 | e << 3 | f << 2 | g << 1 | h

char y = h|(g|(f|(e|(d|(c|(b|a << 1)<< 1)<< 1)<< 1)<< 1)<< 1)<< 1

... instead of 7 different shifting amounts, a nested form allow re-using the same shifting constant over and over again,

with the added benefit of placing all the bools/digits together on one side, and all the shifting amounts on the other side

The arithmetic equivalent expressions would be (they're all identical) -

char y = h + (g + (f + (e + (d + (c + (b + a * 2) * 2) * 2) * 2) * 2) * 2) * 2

char y = 2 * (2 * (2 * (2 * (2 * (2 * (2 * a + b) + c) + d) + e) + f) + g) + h

char y = 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * ( 2 * (\
            a ) + b ) + c ) + d ) + e ) + f ) + g ) + h

Personally I like the last form the most because it's a full-blown base-conversion routine without requiring the audience to even have any understanding what base and exponents are, or what shifting entails as long as they understand addition and multiplication.

Arctic answered 1/12, 2023 at 4:40 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

The cool way (using the multiplication technique)

How they work

The efficient way

The other fast way is SSE2

Recommended topics

Hot tags