How to define and work with an array of bits in C?
Asked Answered
T

5

54

I want to create a very large array on which I write '0's and '1's. I'm trying to simulate a physical process called random sequential adsorption, where units of length 2, dimers, are deposited onto an n-dimensional lattice at a random location, without overlapping each other. The process stops when there is no more room left on the lattice for depositing more dimers (lattice is jammed).

Initially I start with a lattice of zeroes, and the dimers are represented by a pair of '1's. As each dimer is deposited, the site on the left of the dimer is blocked, due to the fact that the dimers cannot overlap. So I simulate this process by depositing a triple of '1's on the lattice. I need to repeat the entire simulation a large number of times and then work out the average coverage %.

I've already done this using an array of chars for 1D and 2D lattices. At the moment I'm trying to make the code as efficient as possible, before working on the 3D problem and more complicated generalisations.

This is basically what the code looks like in 1D, simplified:

int main()
{
    /* Define lattice */
    array = (char*)malloc(N * sizeof(char));

    total_c = 0;

    /* Carry out RSA multiple times */
    for (i = 0; i < 1000; i++)
        rand_seq_ads();

    /* Calculate average coverage efficiency at jamming */
    printf("coverage efficiency = %lf", total_c/1000);

    return 0;
}

void rand_seq_ads()
{
    /* Initialise array, initial conditions */
    memset(a, 0, N * sizeof(char));
    available_sites = N;
    count = 0;

    /* While the lattice still has enough room... */
    while(available_sites != 0)
    {
        /* Generate random site location */
        x = rand();

        /* Deposit dimer (if site is available) */
        if(array[x] == 0)
        {
            array[x] = 1;
            array[x+1] = 1;
            count += 1;
            available_sites += -2;
        }

        /* Mark site left of dimer as unavailable (if its empty) */
        if(array[x-1] == 0)
        {
            array[x-1] = 1;
            available_sites += -1;
        }
    }

    /* Calculate coverage %, and add to total */
    c = count/N
    total_c += c;
}

For the actual project I'm doing, it involves not just dimers but trimers, quadrimers, and all sorts of shapes and sizes (for 2D and 3D).

I was hoping that I would be able to work with individual bits instead of bytes, but I've been reading around and as far as I can tell you can only change 1 byte at a time, so either I need to do some complicated indexing or there is a simpler way to do it?

Thanks for your answers

Tutorial answered 26/3, 2010 at 17:26 Comment(1)
Note for once you are working on individual bits: If efficiency is vital, you'll probably want to, where possible, apply your operations on at least a byte at a time (i.e. look at multiple coordinates at the same time), since doing so, if done right, doesn't cost anything extra. It's probably not worth the hassle to do this except in the bottleneck portions of the code.Alkylation
P
66

If I am not too late, this page gives awesome explanation with examples.

An array of int can be used to deal with array of bits. Assuming size of int to be 4 bytes, when we talk about an int, we are dealing with 32 bits. Say we have int A[10], means we are working on 10*4*8 = 320 bits and following figure shows it: (each element of array has 4 big blocks, each of which represent a byte and each of the smaller blocks represent a bit)

enter image description here

So, to set the kth bit in array A:

// NOTE: if using "uint8_t A[]" instead of "int A[]" then divide by 8, not 32
void  SetBit( int A[],  int k )
{
    int i = k/32;        //gives the corresponding index in the array A
    int pos = k%32;      //gives the corresponding bit position in A[i]

    unsigned int flag = 1;   // flag = 0000.....00001

    flag = flag << pos;      // flag = 0000...010...000   (shifted k positions)

    A[i] = A[i] | flag;      // Set the bit at the k-th position in A[i]
}

or in the shortened version

void  SetBit( int A[],  int k )
{
    A[k/32] |= 1 << (k%32);  // Set the bit at the k-th position in A[i]
}

similarly to clear kth bit:

void  ClearBit( int A[],  int k )                
{
    A[k/32] &= ~(1 << (k%32));
}

and to test if the kth bit:

int TestBit( int A[],  int k )
{
    return ( (A[k/32] & (1 << (k%32) )) != 0 ) ;     
}

As said above, these manipulations can be written as macros too:

// Due order of operation wrap 'k' in parentheses in case it
// is passed as an equation, e.g. i + 1, otherwise the first
// part evaluates to "A[i + (1/32)]" not "A[(i + 1)/32]"
#define SetBit(A,k)     ( A[(k)/32] |= (1 << ((k)%32)) )
#define ClearBit(A,k)   ( A[(k)/32] &= ~(1 << ((k)%32)) )
#define TestBit(A,k)    ( A[(k)/32] & (1 << ((k)%32)) )
Proof answered 2/6, 2015 at 8:11 Comment(7)
When deciding whether to go with the functions or macros for efficiency, its worth comparing the generated machine code for your compiler to see if there is a difference (e.g. "gcc -O2 -S". If calling these from other modules see #5987520). If the compiler is good enough, at top optimization levels the generated code for the functions should be equivalent to the macros. The advantage of sticking with the functions is that they are easier for editors, debuggers (at lower optimization levels) and humans to understand.Gothicism
The size of an int depends on your compiler. Don't assume an int is 4 bytes. Check. On small micros, an int might be 16 bits.Sensibility
It would make more sense to 1) use unsigned int instead of int when dealing with bits, 2) use sizeof(unsigned)*CHAR_BIT instead of 32, or 3) simply use uint32_t. unsigned int/sizeof(unsigned) would perhaps be a better idea if you want to support architectures with different int sizes where accessing a 32-bit int would need more than 1 instruction.Vivia
yes, I agree, but I just reprsensted the content given on the link, in case that link is no longer accessible (it was, in fact, requested by someone and that comment is deleted nowProof
Is x != 0 in TestBit necessarily or it take any advantage?Tracy
Here is a archive of the page mentioned above, the link isn't working now: web.archive.org/web/20220706030302/http://www.mathcs.emory.edu/…Maple
It seems the cited page has been moved to: cs.emory.edu/~cheung/Courses/255/Syllabus/1-C-intro/…Holoenzyme
I
11
typedef unsigned long bfield_t[ size_needed/sizeof(long) ];
// long because that's probably what your cpu is best at
// The size_needed should be evenly divisable by sizeof(long) or
// you could (sizeof(long)-1+size_needed)/sizeof(long) to force it to round up

Now, each long in a bfield_t can hold sizeof(long)*8 bits.

You can calculate the index of a needed big by:

bindex = index / (8 * sizeof(long) );

and your bit number by

b = index % (8 * sizeof(long) );

You can then look up the long you need and then mask out the bit you need from it.

result = my_field[bindex] & (1<<b);

or

result = 1 & (my_field[bindex]>>b); // if you prefer them to be in bit0

The first one may be faster on some cpus or may save you shifting back up of you need to perform operations between the same bit in multiple bit arrays. It also mirrors the setting and clearing of a bit in the field more closely than the second implemention. set:

my_field[bindex] |= 1<<b;

clear:

my_field[bindex] &= ~(1<<b);

You should remember that you can use bitwise operations on the longs that hold the fields and that's the same as the operations on the individual bits.

You'll probably also want to look into the ffs, fls, ffc, and flc functions if available. ffs should always be avaiable in strings.h. It's there just for this purpose -- a string of bits. Anyway, it is find first set and essentially:

int ffs(int x) {
    int c = 0;
    while (!(x&1) ) {
        c++;
        x>>=1;
    }
    return c; // except that it handles x = 0 differently
}

This is a common operation for processors to have an instruction for and your compiler will probably generate that instruction rather than calling a function like the one I wrote. x86 has an instruction for this, by the way. Oh, and ffsl and ffsll are the same function except take long and long long, respectively.

Incorruption answered 26/3, 2010 at 19:52 Comment(1)
A byte is not necessarily 8 bits long! Technically, each long in your bfield_t can hold CHAR_BIT * sizeof (long) bits, not 8 * sizeof (long) bits, it's just that on many architectures CHAR_BIT equals 8.Olivero
T
7

You can use & (bitwise and) and << (left shift).

For example, (1 << 3) results in "00001000" in binary. So your code could look like:

char eightBits = 0;

//Set the 5th and 6th bits from the right to 1
eightBits &= (1 << 4);
eightBits &= (1 << 5);
//eightBits now looks like "00110000". 

Then just scale it up with an array of chars and figure out the appropriate byte to modify first.

For more efficiency, you could define a list of bitfields in advance and put them in an array:

#define BIT8 0x01
#define BIT7 0x02
#define BIT6 0x04
#define BIT5 0x08
#define BIT4 0x10
#define BIT3 0x20
#define BIT2 0x40
#define BIT1 0x80

char bits[8] = {BIT1, BIT2, BIT3, BIT4, BIT5, BIT6, BIT7, BIT8};

Then you avoid the overhead of the bit shifting and you can index your bits, turning the previous code into:

eightBits &= (bits[3] & bits[4]);

Alternatively, if you can use C++, you could just use an std::vector<bool> which is internally defined as a vector of bits, complete with direct indexing.

Telemachus answered 26/3, 2010 at 17:34 Comment(6)
Using std::vector<bool> won't get him optimal performance, since he'll end up having two lookups to get one pair of bits. Whether this penalty is sufficient to justify creating his own variation of std::vector<bool> is dependent upon whether the lookups (and assignments) themselves are a bottleneck.Alkylation
Assuming C++ were an option (the OP only mentioned C) I'd not hesitate to start off with an std::vector<bool>, simply for conciseness and readability. If I then needed better performance, I'd profile to find out where the bottleneck was. (It could very well be in rand() and not the vector lookup).Telemachus
Instead of char bits[8] = { ... }; you could do #define bits(x) BIT##x.Liberec
I think you mean |= and | instead of &= and &.Tutorial
I need to create a very large array, with more than 'max_size of int' boolean values/bits. Is this possible with vector<bool> or bitset?Tutorial
"For more efficiency" ⇒ it depends on the architecture. Maybe sometimes shift could be cheaper than array access. In other words, this is a very small "improvement", if any. Don't worry about it unless you really really need to. Premature optimization is the root of all evil.Sooty
P
6

bitarray.h:

#include <inttypes.h> // defines uint32_t

//typedef unsigned int bitarray_t; // if you know that int is 32 bits
typedef uint32_t bitarray_t;

#define RESERVE_BITS(n) (((n)+0x1f)>>5)
#define DW_INDEX(x) ((x)>>5)
#define BIT_INDEX(x) ((x)&0x1f)
#define getbit(array,index) (((array)[DW_INDEX(index)]>>BIT_INDEX(index))&1)
#define putbit(array, index, bit) \
    ((bit)&1 ?  ((array)[DW_INDEX(index)] |= 1<<BIT_INDEX(index)) \
             :  ((array)[DW_INDEX(index)] &= ~(1<<BIT_INDEX(index))) \
             , 0 \
    )

Use:

bitarray_t arr[RESERVE_BITS(130)] = {0, 0x12345678,0xabcdef0,0xffff0000,0};
int i = getbit(arr,5);
putbit(arr,6,1);
int x=2;            // the least significant bit is 0
putbit(arr,6,x);    // sets bit 6 to 0 because 2&1 is 0
putbit(arr,6,!!x);  // sets bit 6 to 1 because !!2 is 1

EDIT the docs:

"dword" = "double word" = 32-bit value (unsigned, but that's not really important)

RESERVE_BITS: number_of_bits --> number_of_dwords
    RESERVE_BITS(n) is the number of 32-bit integers enough to store n bits
DW_INDEX: bit_index_in_array --> dword_index_in_array
    DW_INDEX(i) is the index of dword where the i-th bit is stored.
    Both bit and dword indexes start from 0.
BIT_INDEX: bit_index_in_array --> bit_index_in_dword
    If i is the number of some bit in the array, BIT_INDEX(i) is the number
    of that bit in the dword where the bit is stored.
    And the dword is known via DW_INDEX().
getbit: bit_array, bit_index_in_array --> bit_value
putbit: bit_array, bit_index_in_array, bit_value --> 0

getbit(array,i) fetches the dword containing the bit i and shifts the dword right, so that the bit i becomes the least significant bit. Then, a bitwise and with 1 clears all other bits.

putbit(array, i, v) first of all checks the least significant bit of v; if it is 0, we have to clear the bit, and if it is 1, we have to set it.
To set the bit, we do a bitwise or of the dword that contains the bit and the value of 1 shifted left by bit_index_in_dword: that bit is set, and other bits do not change.
To clear the bit, we do a bitwise and of the dword that contains the bit and the bitwise complement of 1 shifted left by bit_index_in_dword: that value has all bits set to one except the only zero bit in the position that we want to clear.
The macro ends with , 0 because otherwise it would return the value of dword where the bit i is stored, and that value is not meaningful. One could also use ((void)0).

Phiona answered 15/7, 2014 at 8:20 Comment(4)
Why just uint32_t, not uint64_t in example?Harmonyharmotome
@DennisV.R. The code is old, it was written for use on a 32-bit embedded system doing a real-time task. Since most variables are int-sized (including pointers), when you allocate 1 byte, 3 bytes usually are wasted, so it is not meaningful to allocate bytes. OTOH, with a 32-bit CPU, a 64-bit AND is implemented as two 32-bit ANDs, so it is not meaningful to use 64-bit ints. In addition, uint64 may require 8-byte alignment (which depends on architecture and the compiler). So 32-bit ints were chosen.Phiona
@DennisV.R. BTW, today I would probably use uint_fast32_t for bitarray_t and would replace 5 with DW_INDEX_BITS defined as #define DW_INDEX_BITS (3+__builtin_ctz(sizeof(bitarray_t))). 0x1f would become ((1<<DW_INDEX_BITS)-1). And the code must be tested after such changes. __builtin_ctz() is a constant expression(!) but it is gcc-specific. And boards for embedded systems may be accompanied by rather old compilers (probably the compiler was reasonably new when the board was new).Phiona
@Phiona shouldn't those left-shifted ones in putbit be cast to uint32_t on the off-chance index is 31?Clod
K
2

It's a trade-off:

(1) use 1 byte for each 2 bit value - simple, fast, but uses 4x memory

(2) pack bits into bytes - more complex, some performance overhead, uses minimum memory

If you have enough memory available then go for (1), otherwise consider (2).

Kriss answered 26/3, 2010 at 17:35 Comment(2)
@Paul: No, it uses 4x as much memory, since he would be storing 2bit numbers in 1 byte. However, I think from the OP's question that he has already made a decision to go with (2).Alkylation
@Brian: Thanks - I missed that part - I'll update my answer accordingly.Kriss

© 2022 - 2024 — McMap. All rights reserved.