Why is a boolean 1 byte and not 1 bit of size?

Asked 7/1, 2011 at 15:2 Answered 11/12, 2021 at 18:25

Solved c++boolean byte cpu-architecture abi

216

In C++,

Why is a boolean 1 byte and not 1 bit of size?
Why aren't there types like a 4-bit or 2-bit integers?

I'm missing out the above things when writing an emulator for a CPU

Montpelier answered 7/1, 2011 at 15:2 Comment(2)

In C++ you can "pack" the data by using bit-fields. struct Packed { unsigned int flag1 : 1; unsigned int flag2: 1; };. Most compilers will allocate a full unsigned int, however they deal with the bit-twiddling by themselves when you read / write. Also they deal by themselves with the modulo operations. That is a unsigned small : 4 attribute has a value between 0 and 15, and when it should get to 16, it won't overwrite the preceding bit :) – Dimitri 7/1, 2011 at 15:15

But note / beware that it's not thread-safe for different threads to write adjacent bitfields in the same object. It is thread-safe for them to write separate bool members of a struct/class. This means compilers are allowed to implement bitfield writes by loading the containing word, doing some bit-manipulation, then just storing the whole word (not doing an atomic CAS). Related: C++ memory model and race conditions on char arrays - that's why word-addressable machines can't use 1-byte char in a C11 or C++11 implementaiton. – Alarum 12/12, 2021 at 3:1

315

Because the CPU can't address anything smaller than a byte.

Membrane answered 7/1, 2011 at 15:3 Comment(9)

Actually, the four x86 instructions bt, bts, btr and btc can address single bits! – Pernas 7/1, 2011 at 16:7

I think bt addresses a byte offset and then tests the bit at a given offset, regardless, when specifying an address you go in bytes...bit offset literals would get a bit wordy (excuse the pun). – Owens 7/1, 2011 at 16:12

@six: You can load the beginning of an array in one register and then the relative "bit offset" into a second. The bit offset is not limited to "within one byte", it can be any 32 bit number. – Pernas 7/1, 2011 at 16:15

Gotcha, I hadn't used bt with a register, only with an 8 bit immediate. – Owens 7/1, 2011 at 16:34

Well, yes and no. We do have bitfields, and we could have a bitfield pointer, that is address + bit number. Obviously, such a pointer would not be convertible to void* because of the extra storage requirement for the bit number. – Bonds 7/1, 2011 at 17:10

What a waste of resources though. – Lutero 5/9, 2016 at 10:2

@Lutero if you're trying to cram as much information into sub-byte fields as you can fit, there are always bitfields. – Membrane 6/9, 2016 at 12:25

some CPUs/MCUs like 8051 do have bit-addressable memory and can address single bits. Besides many architectures like ARM or PowerPC have instructions to access bitfields so you can access single bits efficiently – Lachrymatory 20/1, 2021 at 16:53

@Lachrymatory unless you can make a pointer to it, you can't make it a datatype. – Membrane 21/1, 2021 at 13:26

From Wikipedia:

Historically, a byte was the number of bits used to encode a single character of text in a computer and it is for this reason the basic addressable element in many computer architectures.

So byte is the basic addressable unit, below which computer architecture cannot address. And since there doesn't (probably) exist computers which support 4-bit byte, you don't have 4-bit bool etc.

However, if you can design such an architecture which can address 4-bit as basic addressable unit, then you will have bool of size 4-bit then, on that computer only!

Riverside answered 7/1, 2011 at 15:18 Comment(6)

"you will have int of size 4-bit then, on that computer only" - no you won't, because the standard forbids CHAR_BIT from being less than 8. If the addressable unit on the architecture is less than 8 bits, then a C++ implementation will just have to present a memory model that's different from the underlying hardware's memory model. – Melda 7/1, 2011 at 16:51

@Steve : oops... I overlooked that. Removed int and char from my post. – Riverside 7/1, 2011 at 16:58

you can't have a 4-bit bool either, because the char is the smallest addressable unit in C++, regardless of what the architecture can address with its own opcodes. sizeof(bool) must have a value of at least 1, and adjacent bool objects must have their own addresses in C++, so the implementation just has to make them bigger and waste memory. That's why bit fields exist as a special case: the bitfield members of a struct aren't required to be separately addressable, so they can be smaller than a char (although the whole struct still can't be). – Melda 8/1, 2011 at 12:59

@ Steve Jessop : that seems interesting. could you please give me the reference from the language specification where it says char is the smallest addressable unit in C++? – Riverside 8/1, 2011 at 13:9

closest specific statement is probably 3.9/4: "The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T)". Obviously sizeof(bool) can't be 0.5 :-) I suppose an implementation could legally provide sub-byte pointers as an extension, but "ordinary" objects like bool, allocated in ordinary ways, have to do what the standard says. – Melda 8/1, 2011 at 15:43

I think 8 bits was also chosen for the instruction set. That gives you 256 possible instructions which is enough to implement a good CPU (like the 6502 and Z80). Less bits and you have to use tricks or multiple bytes to implement instructions which makes the CPU more complicated. – Dandridge 26/8, 2019 at 23:57

Back in the old days when I had to walk to school in a raging blizzard, uphill both ways, and lunch was whatever animal we could track down in the woods behind the school and kill with our bare hands, computers had much less memory available than today. The first computer I ever used had 6K of RAM. Not 6 megabytes, not 6 gigabytes, 6 kilobytes. In that environment, it made a lot of sense to pack as many booleans into an int as you could, and so we would regularly use operations to take them out and put them in.

Today, when people will mock you for having only 1 GB of RAM, and the only place you could find a hard drive with less than 200 GB is at an antique shop, it's just not worth the trouble to pack bits.

Ousley answered 7/1, 2011 at 17:36 Comment(20)

Except when dealing with Flags. Things like Setting multiple options on something... eg. 00000001 + 00000100 = 00000101. – Albumen 7/1, 2011 at 21:22

@Atomix: I almost never do this anymore. If I need two flags, I create two boolean fields. I used to write code where I'd pack flags like that and then write "if flags & 0x110 != 0 then" or the like, but this is cryptic and these days I generally make separate fields and write "if fooFlag || barFlag" instead. I wouldn't rule out the possibility of cases where packing flags like that is better for some reason, but it's no longer necessary to save memory like it used to be. – Ousley 10/1, 2011 at 18:56

Yeah, I was thinking of flags that are already baked into a language. For example, in C# I'm sure RegexOptions uses a flag to set options: RegexOptions.CultureInvariant + RegexOptions.IgnoreCase – Albumen 11/1, 2011 at 22:1

@Atomix: Oh, sure. MS Windows SDK had tons of these built in, so you had no choice but to deal with them. – Ousley 12/1, 2011 at 14:3

Actually, it is quite worth your trouble to pack bits, if you want your computation to be fast - on that large amount of data you store in memory. Packing booleans isn't just for smaller storage - it means you can read your boolean input arrays 8 times faster (in terms of bandwidth) as when they're unpacked, and that's often quite significant. Also, you can use bit operations, like popc (population count) which speeds up your work on the CPU itself. – Lankford 18/5, 2016 at 12:39

@Lankford If you are writing large arrays of booleans to a file, yes, bandwidth savings could be significant. But it would have to be a binary file, not human readable, which is rather the opposite direction from where we're going these days. Well, I suppose if you're writing an XML file, instead of separate tags for each boolean you could write a single flag field as an integer, but that would be rather contrary to the philosophy of XML. Similarly, you could pack booleans into an int before writing to a DB. IMHO the gain of a small storage and bandwidth saving would be far outweighed ... – Ousley 18/5, 2016 at 13:34

... by the loss in ease of comprehension and maintainability from having a single "flag" field that the programmer has to pick apart rather than separate fields for each flag with a friendly name. (Do DB engines pack booleans? I'll have to check on this.) If you had a thousand booleans in every record, yes, it could be worth it. But more often we have two or three. – Ousley 18/5, 2016 at 13:36

@Jay: I didn't say anything about a file. Think in-memory processing of large amounts of data. – Lankford 18/5, 2016 at 13:40

@Lankford In memory, I'd think you'd have to have truly huge numbers of booleans for the time to move them around to be a significant percentage of CPU time in any real application. And for that to even be relevant you'd have to be moving or copying them, not just building an array and processing it. So sure, if you have an app where you're manipulating a million booleans, making copies of the list, shuffling it around, whatever, then yes, might possibly be worth it. But for the more normal case where you have, like 20 in the whole app? No. – Ousley 18/5, 2016 at 13:58

Truly huge number of booleans is what you work with every day if you do: DBMSes, machine learning, scientific simulations, and a whole host of other things. And - just working on them means copying them - from memory into cache. A million bools is nothing, think billions. – Lankford 18/5, 2016 at 17:33

@Lankford (shrug) I guess the projects you work on are different from mine. I certainly store millions of Booleans on a DB, but I typically load only a tiny fraction of them into memory at a time. – Ousley 19/5, 2016 at 4:11

Not only computers are being programmed, we have microcontrollers as well, and then we are indeed talking about kilobytes of ram – Unfold 12/4, 2018 at 12:38

@AndrévanSchoubroeck Fair enough. If you're working on some highly constrained environment, AND you need to manipulate large numbers of booleans simultaneously, then packing them could be worthwhile. – Ousley 12/4, 2018 at 16:11

CPU cache is always "highly constrained", like 32k L1d cache, 256k L2 cache. Having your working set fit in cache is a very big deal. Packing bitmaps is a very good thing for a big Sieve of Eratosthenes, but yes if even unpacked bool[] will fit in cache then it's typically better at small sizes. If you use SIMD to handle multiple elements at once, doing 128 bools per 16-byte SIMD vector instead of 16 is a nice speedup (for bitwise AND, or popcount). Or with AVX512, you can compare into a mask, or load a chunk of bitmap from memory and use it as a mask for SIMD vector elements. – Alarum 31/5, 2020 at 21:31

@PeterCordes Maybe some of you folks work on very different kinds of apps than I do. I work mostly on "business apps". So I might say, read the item record for the product the customer ordered, it has a boolean for "is this item taxable", if true add it's cost to the taxable cost counter before calculating sales tax, if false don't. Maybe there's another boolean or two for other attributes of an item. That sort of thing. It is very, very rare for me to manipulate arrays of booleans. Rather, I might have 5 distinct boolean values in the whole program. ... – Ousley 1/6, 2020 at 2:24

... They're almost always manipulated individually, not in any sort of groups. So packing and unpacking them would be more work for the CPU, not less. If you have an app where you have some large array of booleans and you have to AND array A with array B bitwise, yes, very different story. I'm struggling to recall a time when I ever wrote such an app, but it's certainly not inconceivable. For IT departments I've worked in, though, that would be a rare to non-existant case. – Ousley 1/6, 2020 at 2:26

In that case yes, separate booleans, one per byte, is a good design for modern CPUs. The last code I worked on that dealt with arrays of masks was data analysis for flow cytometry; you get 4 to 20 separate measurements per cell, over millions of cells, and want to look for correlations. So you might use one column as X, another as Y, and select rows inside a box or something. Then use that mask along with another mask on a different X Y pair. Then do whatever stats on the matching elements that pass both masks. Packed masks work great with SIMD – Alarum 1/6, 2020 at 2:36

Other use-cases for packed bits include Linux system calls to set thread affinity, like sched_setaffinity(2) which takes an opaque type that represents the set of all CPU numbers this thread can be scheduled on. The actual implementation is a bitmask = array of packed bits. – Alarum 1/6, 2020 at 2:39

Also related: Howard Hinnant's article: On vector<bool> talks about the fact that a bit-array is a potentially useful data structure, it's merely unfortunate that it's called vector<bool> in C++. – Alarum 1/6, 2020 at 2:40

@PeterCordes Yes, absolutely, if I had a set of booleans that were logically the "same idea" so that I naturally think of them as an "array" in some sense, and if I'm then going to mask or filter them or otherwise perform bitwise operations on them, then packing them into bytes might make good sense. As I said earlier, I'm hard pressed to think of the last time I worked on an application where those conditions applied, but you give a couple of good examples, and I'm sure with a little imagination one could think of others. – Ousley 2/6, 2020 at 15:14

The easiest answer is; it's because the CPU addresses memory in bytes and not in bits, and bitwise operations are very slow.

However it's possible to use bit-size allocation in C++. There's std::vector specialization for bit vectors, and also structs taking bit sized entries.

Glaucous answered 7/1, 2011 at 15:4 Comment(7)

Not sure I would agree that bitwise operations are slow. ands, nots, xors etc are very fast. It is typically the implementation of the bitwise operations that are slow. At the machine level they are quite fast. Branching... now that is slow. – Thymus 7/1, 2011 at 15:7

Just to make it more clear, if you create a vector of booleans and put 24 booleans into it, it will be taking 3 bytes only (3*8). If you put another boolean in, it will take another byte. Yet, if you push another boolean, it won't take any extra bytes because it uses the "free" bits in the last byte – Chaetognath 7/1, 2011 at 15:8

yeah, I also doubt bitewise operations are slow :) – Chaetognath 7/1, 2011 at 15:9

The bit vectors do not create bit-sized allocations. they create byte-sized allocations. It is not possible to allocate a single bit. – Maurya 7/1, 2011 at 15:11

Find it unlikely that bit operations are slow. But std::vector<bool> is slow (now considered a mistake in most circles (but we can't undo it)). – Millsaps 7/1, 2011 at 15:13

Reading a single bit in a bit vector requires three operations: shift, and, and another shift again. Writing is two. Whereas individual bytes can be accessed with a single one. – Glaucous 7/1, 2011 at 15:21

@PedroLoureiro: correct, it's not that bitwise operations are slow, it's that read-modify-write to update one bit in a byte is much worse than write-only to store a whole byte (or word on machines without efficient byte stores). It takes extra operations, and RMW would create a false dependency between adjacent bits in a word, preventing out-of-order execution of what would otherwise be independent work. – Alarum 31/5, 2020 at 21:27

Because a byte is the smallest addressible unit in the language.

But you can make bool take 1 bit for example if you have a bunch of them eg. in a struct, like this:

struct A
{
  bool a:1, b:1, c:1, d:1, e:1;
};

Longshore answered 7/1, 2011 at 15:5 Comment(0)

You can use bit fields to get integers of sub size.

struct X
{
    int   val:4;   // 4 bit int.
};

Though it is usually used to map structures to exact hardware expected bit patterns:

// 1 byte value (on a system where 8 bits is a byte)
struct SomThing   
{
    int   p1:4;   // 4 bit field
    int   p2:3;   // 3 bit field
    int   p3:1;   // 1 bit
};

Millsaps answered 7/1, 2011 at 15:15 Comment(0)

You could have 1-bit bools and 4 and 2-bit ints. But that would make for a weird instruction set for no performance gain because it's an unnatural way to look at the architecture. It actually makes sense to "waste" a better part of a byte rather than trying to reclaim that unused data.

The only app that bothers to pack several bools into a single byte, in my experience, is Sql Server.

Angularity answered 7/1, 2011 at 15:5 Comment(0)

bool can be one byte -- the smallest addressable size of CPU, or can be bigger. It's not unusual to have bool to be the size of int for performance purposes. If for specific purposes (say hardware simulation) you need a type with N bits, you can find a library for that (e.g. GBL library has BitSet<N> class). If you are concerned with size of bool (you probably have a big container,) then you can pack bits yourself, or use std::vector<bool> that will do it for you (be careful with the latter, as it doesn't satisfy container requirments).

Vevine answered 7/1, 2011 at 16:1 Comment(0)

Because in general, CPU allocates memory with 1 byte as the basic unit, although some CPU like MIPS use a 4-byte word.

However vector deals bool in a special fashion, with vector<bool> one bit for each bool is allocated.

Intrepid answered 7/1, 2011 at 15:6 Comment(5)

I believe even the MIPS cpu will give you access to an individual byte, although there is a performance penalty. – Membrane 7/1, 2011 at 15:21

@Paul: Yes you are right, but generally the word-specific lw/sw are much more widely used. – Intrepid 7/1, 2011 at 15:29

Don't know about MIPS, but IA-64 architecture allows only access on 64-bit boundary. – Vevine 7/1, 2011 at 16:6

@PaulTomblin: you're correct, DEC Alpha is the only ISA in recent memory with byte-addressable memory but without byte actual byte load/store instructions. (See Can modern x86 hardware not store a single byte to memory? for details). – Alarum 12/12, 2021 at 3:21

@GeneBushuyev: Wrong for IA-64. csee.umbc.edu/portal/help/architecture/aig.pdf#page=41 confirms that IA-64 ld instructions supported an access size of 1, 2, 4, or 8 bytes. (For sizes less than 64-bit, the result is zero-extended into a 64-bit reg, like a normal RISC rather than x86 partial-registers.) Since IA-64 was designed by Intel with hopes of taking over from x86 (via emulation, or in early CPUs via hardware support for an IA-32 mode), unaligned word load/store is also optionally supported (even in IA-64 mode). – Alarum 12/12, 2021 at 3:32

Think about how you would implement this at your emulator level...

bool a[10] = {false};

bool &rbool = a[3];
bool *pbool = a + 3;

assert(pbool == &rbool);
rbool = true;
assert(*pbool);
*pbool = false;
assert(!rbool);

Playground answered 7/1, 2011 at 21:18 Comment(0)

The byte is the smaller unit of digital data storage of a computer. In a computer the RAM has millions of bytes and anyone of them has an address. If it would have an address for every bit a computer could manage 8 time less RAM that what it can.

More info: Wikipedia

Scabble answered 7/1, 2011 at 15:9 Comment(0)

Even when the minimum size possible is 1 Byte, you can have 8 bits of boolean information on 1 Byte:

http://en.wikipedia.org/wiki/Bit_array

Julia language has BitArray for example, and I read about C++ implementations.

Pompidou answered 24/1, 2013 at 14:16 Comment(0)

Bitwise operations are not 'slow'.

And/Or operations tend to be fast.

The problem is alignment and the simple problem of solving it.

CPUs as the answers partially-answered correctly are generally aligned to read bytes and RAM/memory is designed in the same way.

So data compression to use less memory space would have to be explicitly ordered.

As one answer suggested, you could order a specific number of bits per value in a struct. However what does the CPU/memory do afterward if it's not aligned? That would result in unaligned memory where instead of just +1 or +2, or +4, there's not +1.5 if you wanted to use half the size in bits in one value, etc. so it must anyway fill in or revert the remaining space as blank, then simply read the next aligned space, which are aligned by 1 at minimum and usually by default aligned by 4(32bit) or 8(64bit) overall. The CPU will generally then grab the byte value or the int value that contains your flags and then you check or set the needed ones. So you must still define memory as int, short, byte, or the proper sizes, but then when accessing and setting the value you can explicitly compress the data and store those flags in that value to save space; but many people are unaware of how it works, or skip the step whenever they have on/off values or flag present values, even though saving space in sent/recv memory is quite useful in mobile and other constrained enviornments. In the case of splitting an int into bytes it has little value, as you can just define the bytes individually (e.g. int 4Bytes; vs byte Byte1;byte Byte2; byte Byte3; byte Byte4;) in that case it is redundant to use int; however in virtual environments that are easier like Java, they might define most types as int (numbers, boolean, etc.) so thus in that case, you could take advantage of an int dividing it up and using bytes/bits for an ultra efficient app that has to send less integers of data (aligned by 4). As it could be said redundant to manage bits, however, it is one of many optimizations where bitwise operations are superior but not always needed; many times people take advantage of high memory constraints by just storing booleans as integers and wasting 'many magnitudes' 500%-1000% or so of memory space anyway. It still easily has its uses, if you use this among other optimizations, then on the go and other data streams that only have bytes or few kb of data flowing in, it makes the difference if overall you optimized everything to load on whether or not it will load,or load fast, at all in such cases, so reducing bytes sent could ultimately benefit you alot; even if you could get away with oversending tons of data not required to be sent in an every day internet connection or app. It is definitely something you should do when designing an app for mobile users and even something big time corporation apps fail at nowadays; using too much space and loading constraints that could be half or lower. The difference between not doing anything and piling on unknown packages/plugins that require at minumim many hundred KB or 1MB before it loads, vs one designed for speed that requires say 1KB or only fewKB, is going to make it load and act faster, as you will experience those users and people who have data constraints even if for you loading wasteful MB or thousand KB of unneeded data is fast.

Baleen answered 11/12, 2021 at 18:25 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags