When should I use a BitVector32?

BitVector32 is a wrapper (or you can call it abstraction) around c#'s bit operations. For example, the following two statements return the same result:

1 << 1
BitVector32.CreateMask(1)

Let's say there is an integer array containing some duplicate numbers. We want to find all duplicates. Of course, You can simply use the GroupBy function in Linq but let's pretend we don't have Linq.

The first option is brute force approach where each element will be compared against every element in the given array:

foreach(int i in list) 
{
    foreach(int j in list)
    {
        if (i == j) 
        {
            // print this or store it in the result list
        }
    }
}

Since the brute force approach will result in N square running time, which is pretty inefficient, we can think of utilizing HashSet which will provide a constant lookup time or O(1)

HashSet<int> hashSet = new HashSet<int>();

foreach(int i in list)
{    
    if (hashSet.Contains(i))
    {
        // print the duplicate or add it to the result list
    }
    else
    {
        hashSet.Add(i);
    }
}

This approach will result in the linear running time or O(n). However, it requires extra memory of n * 4 bytes (assuming we're talking about the 32 bit integer)

The third approach is similar to using a hashset except it requires less memory by using a boolean array

bool[] masks = new bool[list.Length];

for (int i = 0; i < list.length; i++) 
{
    if (masks[list[i]])
    {
        // print or add to the result list
    }
    else
    {
        masks[list[i]] = true;
    }
}

it uses an boolean array instead of an HashSet. It has the same run time which is O(n) but requires 1/4th amount of the memory since the bool type takes up 1 byte (8bits) while integer takes up 4 bytes (32 bits)

Finally, we can solve this problem using the BitVector32 class or the native bit shifting operations.

int check = 0;
for (int i=0; i < list.Length; i++)
{
    int mask = 1 << list[i];
    if (check & mask == mask) 
    {
        // print or add list[i] to the result list
    }
    else
    {
        check = check | mask;
    }
}

It will also result in a linear run time with only 32 bits of memory in total. So the memory usage is n/32. Of course, this is not going to work if the max value in the array is greater than 32. We can use 64 bit unsigned integer to increase the number of slots in the mask but it still has a very short limit. In that case, if you create an array of BitVectory32 and you can shift the bit to the BitVector32 object in the next index of the array. For instance, the code will go something like below

BitVector32[] bitVectorArray = new BitVector32[maxValue / 32];
bitVectorArray[list[i] / 32] = 1 << list[i] % 32;

This way, you don't have to be limited to the 32 bit size limit. You can grow the size of the big mask indefinitely as long as the memory capacity allows. So, put everything together:

// This code assumes you know the range of the number in the array
BitVector32[] bitVectorArray = new BitVector32[maxValue / 32];

for (int i=0; i < list.Length; i++)
{
    int mask = 1 << list[i] % 32;

    if (bitVectorArray[(list[i] - 1)/32][i] & mask == mask) 
    {
        // print or add list[i] to the result list
    }
    else
    {
        bitVectorArray[(list[i] - 1)/32] = bitVectorArray[list[i] / 32] | mask;
    }
}

Recommended topics

Hot tags