In C++,
- Why is a boolean 1 byte and not 1 bit of size?
- Why aren't there types like a 4-bit or 2-bit integers?
I'm missing out the above things when writing an emulator for a CPU
In C++,
I'm missing out the above things when writing an emulator for a CPU
Because the CPU can't address anything smaller than a byte.
bt
, bts
, btr
and btc
can address single bits! –
Pernas bt
addresses a byte offset and then tests the bit at a given offset, regardless, when specifying an address you go in bytes...bit offset literals would get a bit wordy (excuse the pun). –
Owens bt
with a register, only with an 8 bit immediate. –
Owens From Wikipedia:
Historically, a byte was the number of bits used to encode a single character of text in a computer and it is for this reason the basic addressable element in many computer architectures.
So byte is the basic addressable unit, below which computer architecture cannot address. And since there doesn't (probably) exist computers which support 4-bit byte, you don't have 4-bit bool
etc.
However, if you can design such an architecture which can address 4-bit as basic addressable unit, then you will have bool
of size 4-bit then, on that computer only!
int
and char
from my post. –
Riverside bool
either, because the char
is the smallest addressable unit in C++, regardless of what the architecture can address with its own opcodes. sizeof(bool)
must have a value of at least 1, and adjacent bool
objects must have their own addresses in C++, so the implementation just has to make them bigger and waste memory. That's why bit fields exist as a special case: the bitfield members of a struct aren't required to be separately addressable, so they can be smaller than a char
(although the whole struct still can't be). –
Melda char
is the smallest addressable unit in C++? –
Riverside sizeof(bool)
can't be 0.5 :-) I suppose an implementation could legally provide sub-byte pointers as an extension, but "ordinary" objects like bool, allocated in ordinary ways, have to do what the standard says. –
Melda Back in the old days when I had to walk to school in a raging blizzard, uphill both ways, and lunch was whatever animal we could track down in the woods behind the school and kill with our bare hands, computers had much less memory available than today. The first computer I ever used had 6K of RAM. Not 6 megabytes, not 6 gigabytes, 6 kilobytes. In that environment, it made a lot of sense to pack as many booleans into an int as you could, and so we would regularly use operations to take them out and put them in.
Today, when people will mock you for having only 1 GB of RAM, and the only place you could find a hard drive with less than 200 GB is at an antique shop, it's just not worth the trouble to pack bits.
RegexOptions.CultureInvariant + RegexOptions.IgnoreCase
–
Albumen bool[]
will fit in cache then it's typically better at small sizes. If you use SIMD to handle multiple elements at once, doing 128 bools per 16-byte SIMD vector instead of 16 is a nice speedup (for bitwise AND, or popcount). Or with AVX512, you can compare into a mask, or load a chunk of bitmap from memory and use it as a mask for SIMD vector elements. –
Alarum sched_setaffinity(2)
which takes an opaque type that represents the set of all CPU numbers this thread can be scheduled on. The actual implementation is a bitmask = array of packed bits. –
Alarum vector<bool>
talks about the fact that a bit-array is a potentially useful data structure, it's merely unfortunate that it's called vector<bool>
in C++. –
Alarum The easiest answer is; it's because the CPU addresses memory in bytes and not in bits, and bitwise operations are very slow.
However it's possible to use bit-size allocation in C++. There's std::vector specialization for bit vectors, and also structs taking bit sized entries.
Because a byte is the smallest addressible unit in the language.
But you can make bool take 1 bit for example if you have a bunch of them eg. in a struct, like this:
struct A
{
bool a:1, b:1, c:1, d:1, e:1;
};
You can use bit fields to get integers of sub size.
struct X
{
int val:4; // 4 bit int.
};
Though it is usually used to map structures to exact hardware expected bit patterns:
// 1 byte value (on a system where 8 bits is a byte)
struct SomThing
{
int p1:4; // 4 bit field
int p2:3; // 3 bit field
int p3:1; // 1 bit
};
You could have 1-bit bools and 4 and 2-bit ints. But that would make for a weird instruction set for no performance gain because it's an unnatural way to look at the architecture. It actually makes sense to "waste" a better part of a byte rather than trying to reclaim that unused data.
The only app that bothers to pack several bools into a single byte, in my experience, is Sql Server.
bool
can be one byte -- the smallest addressable size of CPU, or can be bigger. It's not unusual to have bool
to be the size of int
for performance purposes. If for specific purposes (say hardware simulation) you need a type with N bits, you can find a library for that (e.g. GBL library has BitSet<N>
class). If you are concerned with size of bool
(you probably have a big container,) then you can pack bits yourself, or use std::vector<bool>
that will do it for you (be careful with the latter, as it doesn't satisfy container requirments).
Because in general, CPU allocates memory with 1 byte as the basic unit, although some CPU like MIPS use a 4-byte word.
However vector
deals bool
in a special fashion, with vector<bool>
one bit for each bool is allocated.
lw
/sw
are much more widely used. –
Intrepid ld
instructions supported an access size of 1, 2, 4, or 8 bytes. (For sizes less than 64-bit, the result is zero-extended into a 64-bit reg, like a normal RISC rather than x86 partial-registers.) Since IA-64 was designed by Intel with hopes of taking over from x86 (via emulation, or in early CPUs via hardware support for an IA-32 mode), unaligned word load/store is also optionally supported (even in IA-64 mode). –
Alarum Think about how you would implement this at your emulator level...
bool a[10] = {false};
bool &rbool = a[3];
bool *pbool = a + 3;
assert(pbool == &rbool);
rbool = true;
assert(*pbool);
*pbool = false;
assert(!rbool);
The byte is the smaller unit of digital data storage of a computer. In a computer the RAM has millions of bytes and anyone of them has an address. If it would have an address for every bit a computer could manage 8 time less RAM that what it can.
More info: Wikipedia
Even when the minimum size possible is 1 Byte, you can have 8 bits of boolean information on 1 Byte:
http://en.wikipedia.org/wiki/Bit_array
Julia language has BitArray for example, and I read about C++ implementations.
Bitwise operations are not 'slow'.
And/Or operations tend to be fast.
The problem is alignment and the simple problem of solving it.
CPUs as the answers partially-answered correctly are generally aligned to read bytes and RAM/memory is designed in the same way.
So data compression to use less memory space would have to be explicitly ordered.
As one answer suggested, you could order a specific number of bits per value in a struct. However what does the CPU/memory do afterward if it's not aligned? That would result in unaligned memory where instead of just +1 or +2, or +4, there's not +1.5 if you wanted to use half the size in bits in one value, etc. so it must anyway fill in or revert the remaining space as blank, then simply read the next aligned space, which are aligned by 1 at minimum and usually by default aligned by 4(32bit) or 8(64bit) overall. The CPU will generally then grab the byte value or the int value that contains your flags and then you check or set the needed ones. So you must still define memory as int, short, byte, or the proper sizes, but then when accessing and setting the value you can explicitly compress the data and store those flags in that value to save space; but many people are unaware of how it works, or skip the step whenever they have on/off values or flag present values, even though saving space in sent/recv memory is quite useful in mobile and other constrained enviornments. In the case of splitting an int into bytes it has little value, as you can just define the bytes individually (e.g. int 4Bytes; vs byte Byte1;byte Byte2; byte Byte3; byte Byte4;) in that case it is redundant to use int; however in virtual environments that are easier like Java, they might define most types as int (numbers, boolean, etc.) so thus in that case, you could take advantage of an int dividing it up and using bytes/bits for an ultra efficient app that has to send less integers of data (aligned by 4). As it could be said redundant to manage bits, however, it is one of many optimizations where bitwise operations are superior but not always needed; many times people take advantage of high memory constraints by just storing booleans as integers and wasting 'many magnitudes' 500%-1000% or so of memory space anyway. It still easily has its uses, if you use this among other optimizations, then on the go and other data streams that only have bytes or few kb of data flowing in, it makes the difference if overall you optimized everything to load on whether or not it will load,or load fast, at all in such cases, so reducing bytes sent could ultimately benefit you alot; even if you could get away with oversending tons of data not required to be sent in an every day internet connection or app. It is definitely something you should do when designing an app for mobile users and even something big time corporation apps fail at nowadays; using too much space and loading constraints that could be half or lower. The difference between not doing anything and piling on unknown packages/plugins that require at minumim many hundred KB or 1MB before it loads, vs one designed for speed that requires say 1KB or only fewKB, is going to make it load and act faster, as you will experience those users and people who have data constraints even if for you loading wasteful MB or thousand KB of unneeded data is fast.
© 2022 - 2024 — McMap. All rights reserved.
struct Packed { unsigned int flag1 : 1; unsigned int flag2: 1; };
. Most compilers will allocate a fullunsigned int
, however they deal with the bit-twiddling by themselves when you read / write. Also they deal by themselves with the modulo operations. That is aunsigned small : 4
attribute has a value between 0 and 15, and when it should get to 16, it won't overwrite the preceding bit :) – Dimitribool
members of a struct/class. This means compilers are allowed to implement bitfield writes by loading the containing word, doing some bit-manipulation, then just storing the whole word (not doing an atomic CAS). Related: C++ memory model and race conditions on char arrays - that's why word-addressable machines can't use 1-bytechar
in a C11 or C++11 implementaiton. – Alarum