Structure for an array of bits in C
Asked Answered
G

4

6

It has come to my attention that there is no builtin structure for a single bit in C. There is (unsigned) char and int, which are 8 bits (one byte), and long which is 64+ bits, and so on (uint64_t, bool...)

I came across this while coding up a huffman tree, and the encodings for certain characters were not necessarily exactly 8 bits long (like 00101), so there was no efficient way to store the encodings. I had to find makeshift solutions such as strings or boolean arrays, but this takes far more memory.

But anyways, my question is more general: is there a good way to store an array of bits, or some sort of user-defined struct? I scoured the web for one but the smallest structure seems to be 8 bits (one byte). I tried things such as int a : 1 but it didn't work. I read about bit fields but they do not simply achieve exactly what I want to do. I know questions have already been asked about this in C++ and if there is a struct for a single bit, but mostly I want to know specifically what would be the most memory-efficient way to store an encoding such as 00101 in C.

Guy answered 7/7, 2017 at 19:12 Comment(24)
Why not store it as a char/int only, and do bit manipulation and use it?Cyrenaica
No, there is no way to have an array of bits (which will actually have elements of a single bit size).Bulger
Best I can see is using masks on a variable, whether int or octet or whatever. But that's one opinion, as this question will likely be closed as.Deviation
You can't really have anything smaller that a byte because memory allocation in the os works with bytes. So you can't even ask the OS for something else that n bytes.And even if the OS would let you do it, the hardware adress memory per-byte.Overindulge
The smallest unit of addressable memory is one byte, so no.Heriot
@NeilLocketz Well, for completeness it worth noting that there are some architectures having bit-addressable memory areas. In such a cases the corresponding compiler might have some extensions supporting it.Bulger
Ok, so if you can't have an array of just bits, what is the most memory efficient way to store, say, 00101?Guy
@CaryShindell Can't you guess by now?Bulger
A boolean array?Guy
One byte, of course...Bulger
Is memory efficiency really the important criterion? If you're making something like a Huffman tree, its size will be inconsequential even if it takes up 20-30 bytes per node.Flashcube
@EugeneSh. the problem is, a single bit will have 8 bits, and so if 00101 would go to 00000101 as would 000101.Guy
@LeeDanielCrocker My huffman code works; the question is more general, I'm just wondering the most memory-efficient way to do thisGuy
So..? Save the length elsewhere. Well, ok some more memory, but O(1).Bulger
@CaryShindell your boolean array still takes up at minimum 1 byte per boolean.Heriot
@NeilLocketz exactly, that is the problem. it takes 8 times as much memory as a theoretical bit array would.Guy
But do you really care about memory? It is cheap these days.Bulger
Yes, there's no way around needing some extra bits to encode actual length of the value. You could have N bits for length and M-N bits for value squeezed into an M-bit integer type: or you could sacrifice two bits to use "01" as the starting sequence (so, in a 16-bit int, for example, you would encode 00101 as 000000000100101, and 000101 as 0000000001000101).Flashcube
In this context I have enough memory, but as I said it's more of a general question; I'm curious to know the answer regardless :)Guy
I'm surprised nobody mentioned bit fields..Bouillabaisse
@Bouillabaisse they were mentioned in the question and answer below; if you have more info on them please share!Guy
I wonder what happens if you declare a typedef struct with a bitfield length 1 and then define a packed array of them?Bouillabaisse
I think I tried that but it didn't work. Feel free to test/provide code and prove me wrong thoughGuy
@CaryShindell no thanks:) I was just wondering.. :)Bouillabaisse
C
9

If you're mainly interested in accessing a single bit at a time, you can take an array of unsigned char and treat it as a bit array. For example:

unsigned char array[125];

Assuming 8 bits per byte, this can be treated as an array of 1000 bits. The first 16 logically look like this:

     ---------------------------------------------------------------------------------
byte |                   0                   |              1                        |
     ---------------------------------------------------------------------------------
bit  |  0 |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 | 10 | 11 | 12 | 13 | 14 | 15 |
     ---------------------------------------------------------------------------------

Let's say you want to work with bit b. You can then do the following:

Read bit b:

value = (array[b/8] & (1 << (b%8)) != 0;

Set bit b:

array[b/8] |= (1 << (b%8));

Clear bit b:

array[b/8] &= ~(1 << (b%8));

Dividing the bit number by 8 gets you the relevant byte. Similarly, mod'ing the bit number by 8 gives you the relevant bit inside of that byte. You then left shift the value 1 by the bit number to give you the necessary bit mask.

While there is integer division and modulus at work here, the dividend is a power of 2 so any decent compiler should replace them with bit shifting/masking.

Cissie answered 7/7, 2017 at 19:52 Comment(3)
Interesting. Would this save memory though (and how much)?Guy
@CaryShindell This is the most memory efficient way to store individual bits. You'll waste at most 7 bits in the last byte over the entire array.Cissie
Note also that your remark on division and remainder by a power of 2 is not completely true: if b has a signed type and the compiler cannot determine that its value is positive, the division and remainder operations cannot be simplified as a shift and mask alone, an adjustment must be made for negative values because division rounds toward zero. If these expressions are macros, use shift ans mask, if they are inline functions, use unsigned types.Deodand
P
4

It has come to my attention that there is no builtin structure for a single bit in C.

That is true, and it makes sense because substantially no machines have bit-addressible memory.

But anyways, my question is more general: is there a good way to store an array of bits, or some sort of user-defined struct?

One generally uses an unsigned char or another unsigned integer type, or an array of such. Along with that you need some masking and shifting to set or read the values of individual bits.

I scoured the web for one but the smallest structure seems to be 8 bits (one byte).

Technically, the smallest addressible storage unit ([[un]signed] char) could be larger than 8 bits, though you're unlikely ever to see that.

I tried things such as int a : 1 but it didn't work. I read about bit fields but they do not simply achieve exactly what I want to do.

Bit fields can appear only as structure members. A structure object containing such a bitfield will still have a size that is a multiple of the size of a char, so that doesn't map very well onto a bit array or any part of one.

I know questions have already been asked about this in C++ and if there is a struct for a single bit, but mostly I want to know specifically what would be the most memory-efficient way to store an encoding such as 00101 in C.

If you need a bit pattern and a separate bit count -- such as if some of the bits available in the bit-storage object are not actually significant -- then you need a separate datum for the significant-bit count. If you want a data structure for a small but variable number of bits, then you might go with something along these lines:

struct bit_array_small {
    unsigned char bits;
    unsigned char num_bits;
};

Of course, you can make that larger by choosing a different data type for the bits member and, maybe, the num_bits member. I'm sure you can see how you might extend the concept to handling arbitrary-length bit arrays if you should happen to need that.

Potherb answered 7/7, 2017 at 19:27 Comment(1)
Thanks, that's not bad, it's only up to 16-17 extra bits.Guy
F
3

If you really want the most memory efficiency, you can encode the Huffman tree itself as a stream of bits. See, for example:

https://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html

Then just encode those bits as an array of bytes, with a possible waste of 7 bits.

But that would be a horrible idea. For the structure in memory to be useful, it must be easy to access. You can still do that very efficiently. Let's say you want to encode up to 12-bit codes. Use a 16-bit integer and bitfields:

struct huffcode {
    uint16_t length: 4,
             value: 12;
}

C will store this as a single 16-bit value, and allow you to access the length and value fields separately. The complete Huffman node would also contain the input code value, and tree pointers (which, if you want further compactness, can be integer indices into an array).

Flashcube answered 7/7, 2017 at 19:38 Comment(4)
Thanks! So this way we only use no more than double the memory (if we had just bits), and we can do up to 12 bits at a time for the actual encoding.Guy
...and don't forget to tell your compiler to not pad-out the structure; different compilers use different methods for that.Flashcube
Oh yeah, I actually had to go to lengthy ends padding it myself and then taking into account the pad when reading the file... it was messy but worked in the endGuy
Careful. C implementations are at liberty to use separate addressible storage units, and / or storage units of any sufficient size for the two bitfields in your structure. They are also free to include trailing padding in the structure (which also is a possibility for my variation). The C language affords no certainty that an instance of the structure will occupy exactly 16 bits of memory.Potherb
D
1

You can make you own bit array in no time.

#define ba_set(ptr, bit)   { (ptr)[(bit) >> 3] |= (char)(1 << ((bit) & 7)); }
#define ba_clear(ptr, bit) { (ptr)[(bit) >> 3] &= (char)(~(1 << ((bit) & 7))); }
#define ba_get(ptr, bit)   ( ((ptr)[(bit) >> 3] & (char)(1 << ((bit) & 7)) ?  1 : 0 )
#define ba_setbit(ptr, bit, value) { if (value) { ba_set((ptr), (bit)) } else { ba_clear((ptr), (bit)); } }


#define BITARRAY_BITS  (120)

int main()
{
    char mybits[(BITARRAY_BITS + 7) / 8];
    memset(mybits, 0, sizeof(mybits));

    ba_setbit(mybits, 33, 1);
    if (!ba_get(33))
        return 1;
    return 0;
};
Decoy answered 8/7, 2017 at 0:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.