I am dealing with very large list of booleans in C++, around 2^N items of N booleans each. Because memory is critical in such situation, i.e. an exponential growth, I would like to build a N-bits long variable to store each element.
For small N, for example 24, I am just using unsigned long int
. It takes 64MB ((2^24)*32/8/1024/1024). But I need to go up to 36. The only option with build-in variable is unsigned long long int
, but it takes 512GB ((2^36)*64/8/1024/1024/1024), which is a bit too much.
With a 36-bits variable, it would work for me because the size drops to 288GB ((2^36)*36/8/1024/1024/1024), which fits on a node of my supercomputer.
I tried std::bitset
, but std::bitset< N >
creates a element of at least 8B.
So a list of std::bitset< 1 >
is much greater than a list of unsigned long int
.
It is because the std::bitset
just change the representation, not the container.
I also tried boost::dynamic_bitset<>
from Boost, but the result is even worst (at least 32B!), for the same reason.
I know an option is to write all elements as one chain of booleans, 2473901162496 (2^36*36), then to store then in 38654705664 (2473901162496/64) unsigned long long int
, which gives 288GB (38654705664*64/8/1024/1024/1024). Then to access an element is just a game of finding in which elements the 36 bits are stored (can be either one or two). But it is a lot of rewriting of the existing code (3000 lines) because mapping becomes impossible and because adding and deleting items during the execution in some functions will be surely complicated, confusing, challenging, and the result will be most likely not efficient.
How to build a N-bits variable in C++?
::std::vector<bool>
? It is a good choice if you need to store large amount of bits. – Ilariostd::array<uint8_t>
orstd::vector<uint8_t>
of size number_of_bits_needed/sizeof(uint8_t) ? – Kovacsunsigned long long
, why not a singledynamic_bitset
? Then finding element X becomes as simple as going N*X bits in. That simplifies the logic around using it (and you can abstract on top of that) while still being minimal space. The main thing missing is insertion/deletion that isn't at the back. – Mendiolastd::vector<bool>
takes 40o, so it is not helping (and not recommended in any case). – Inspirituint8_t
orunsigned long long int
is formally the same thing, however,uint8_t
have more change that 36 bits are scatered between elements of the container. – Inspiritsize_type
can be anunsigned long long int
, which does not seems to be the case with my preliminary tests. – Inspiritsizeof
. The best you can do is to round up to sizeof(char). So, for the 36-bit example, on a 8-bit char machine, you can have a type which has storage for 40-bits (5 bytes). On x86 architectures, it won't be that slow, as unaligned access is supported. So you can convert from/to 5-byte variable intounsigned long long
pretty fast. – Insultvector<bool>
implies that you will store all2^N
bits in a single container. If you need to reference a pack of N items you can just use a range in vector. – Ilariounsigned long long int
to the name of mystruct
of 5unsinged char
and a function to convert from one to another). – Inspirit38654705664*64/8/1024/1024/1024
gives 288GB, not MB – Assimilatestd::bitset
. You can specify the underlying type to beuint8_t
. Thensizeof( bitset2<36,uint8_t> )= 5
whilesizeof( bitset2<36> )= 8
. The latter on a 64-bit machine. – Desalinate