When should I use a BitVector32?
Asked Answered
M

2

8

I am working on a project where at a certain moment I need to show for one month which days are still available. There is a function that calculates which days are available. My colleagues said:"Oh we know, you should return a BitVector32. That's the most efficient when working with a list of booleans." I would have used a List<bool> or something like that. A BitVector32 seems to me to be something for low level stuff when you are actually working with bits.

So, the question is. Should you use the BitVector32 whenever you need some list of booleans with less than 32 items or should you only use it for low level stuff?

Marcie answered 23/2, 2011 at 17:16 Comment(0)
G
8

Using a List is easily extensible to other time periods. Say you want to show two month at once. Oh that's bigger than 32. I need to change the return type and everywhere it's used. Great! And BitVector32 doesn't even implement IEnumerable<T>.

And unless it's in a tight loop readability and maintainability top efficiency. And the overhead of a list allocation isn't that big, unless you do it a million times per second.

So I agree with you that you should only use BitVector32 for low level code.

Gipsy answered 23/2, 2011 at 17:18 Comment(2)
At first I accepted to use the BitVector32, but the moment I tried to use a foreach I thought to myself:"This can't be right...".Marcie
There is also a BitArray, which you can use for a fixed number of boolean values.Skier
I
3

BitVector32 is a wrapper (or you can call it abstraction) around c#'s bit operations. For example, the following two statements return the same result:

  • 1 << 1
  • BitVector32.CreateMask(1)

Let's say there is an integer array containing some duplicate numbers. We want to find all duplicates. Of course, You can simply use the GroupBy function in Linq but let's pretend we don't have Linq.

  1. The first option is brute force approach where each element will be compared against every element in the given array:

    foreach(int i in list) 
    {
        foreach(int j in list)
        {
            if (i == j) 
            {
                // print this or store it in the result list
            }
        }
    }
    
  2. Since the brute force approach will result in N square running time, which is pretty inefficient, we can think of utilizing HashSet which will provide a constant lookup time or O(1)

    HashSet<int> hashSet = new HashSet<int>();
    
    foreach(int i in list)
    {    
        if (hashSet.Contains(i))
        {
            // print the duplicate or add it to the result list
        }
        else
        {
            hashSet.Add(i);
        }
    }
    

This approach will result in the linear running time or O(n). However, it requires extra memory of n * 4 bytes (assuming we're talking about the 32 bit integer)

  1. The third approach is similar to using a hashset except it requires less memory by using a boolean array

    bool[] masks = new bool[list.Length];
    
    for (int i = 0; i < list.length; i++) 
    {
        if (masks[list[i]])
        {
            // print or add to the result list
        }
        else
        {
            masks[list[i]] = true;
        }
    }
    

it uses an boolean array instead of an HashSet. It has the same run time which is O(n) but requires 1/4th amount of the memory since the bool type takes up 1 byte (8bits) while integer takes up 4 bytes (32 bits)

  1. Finally, we can solve this problem using the BitVector32 class or the native bit shifting operations.

    int check = 0;
    for (int i=0; i < list.Length; i++)
    {
        int mask = 1 << list[i];
        if (check & mask == mask) 
        {
            // print or add list[i] to the result list
        }
        else
        {
            check = check | mask;
        }
    }
    

It will also result in a linear run time with only 32 bits of memory in total. So the memory usage is n/32. Of course, this is not going to work if the max value in the array is greater than 32. We can use 64 bit unsigned integer to increase the number of slots in the mask but it still has a very short limit. In that case, if you create an array of BitVectory32 and you can shift the bit to the BitVector32 object in the next index of the array. For instance, the code will go something like below

BitVector32[] bitVectorArray = new BitVector32[maxValue / 32];
bitVectorArray[list[i] / 32] = 1 << list[i] % 32;

This way, you don't have to be limited to the 32 bit size limit. You can grow the size of the big mask indefinitely as long as the memory capacity allows. So, put everything together:

// This code assumes you know the range of the number in the array
BitVector32[] bitVectorArray = new BitVector32[maxValue / 32];

for (int i=0; i < list.Length; i++)
{
    int mask = 1 << list[i] % 32;

    if (bitVectorArray[(list[i] - 1)/32][i] & mask == mask) 
    {
        // print or add list[i] to the result list
    }
    else
    {
        bitVectorArray[(list[i] - 1)/32] = bitVectorArray[list[i] / 32] | mask;
    }
}
Illusion answered 8/7, 2012 at 21:58 Comment(11)
Thanks for the answer. One remark, in the initialization of the BitVector32 array (and also the byte array), you use the size of the input. In fact you should use the size of the domain over which you want to run your algorithm. Which in case of Int32 is 2^32. This results in a considerably larger memory usage.Marcie
Furthermore, I would initialize the HashSet with an initial capacity. Otherwise you will end up with O(n^2) because it will have to increase its capacity every so many items.Marcie
I think you can fix it by using a Dictionary<int, BitVector32> instead of an BitVector32[] (and Dictionary<int, bool> instead of byte[]). Not sure what this will do to your memory usage, but I don't think it is better than your HashSet<int> approach.Marcie
In this particular example, the dictionary will not give you any advantage either on runtime or memory since the key is an incremental integer starting from 0 which can simply substituted by the array index. The dictionary adds more overhead. And what's better than the hashset approach is the amount of memory is 1/32 because it uses each bit as a maskIllusion
I think you missed my first comment. i is not the key in the dictionary (or the index of the array), list[i] is. list[i] can be any integer value. I.e. your algorithm will throw an IndexOutOfRangeException on the following input: list = new[] { 32 }, even if you correct the initialisation to new BitVector32[((list.length - 1) / 32) + 1].Marcie
You're right on the array index. 0 - 31 goes into bitVectorArray[0], 32-63 goes into bitVectorArray[1] and so on. That's why you don't need Dictionary. The array gives you random access as long as you know the index and therefore the lookup time is O(1) while the dictionary requires extra set of array (internally) and it uses a hash function to map the key to the value which requires more memory and the overhead of hashing.Illusion
Yes, but you need a hash function in order to fit 2^32 (2^27 in case when you use BitVector32) possibilities in O(n) memory. Just paste your algorithm in a console app and define list = new[] { 32 }.Marcie
Matthijs, I think I have confused you with my mistake in the code. I updated the code, the if statement should use bitVectorArray[list[i]/32][i] instead of bitVectorArray[list[i]/32]. Each BitVector32 let you access every 32 bit mask. For instance, BitVector32[3] = true. In other words, if you have numbers between 0 and 31, you only use one single BitVector32 object instead of 32 elements in a dictionaryIllusion
I think in a case that we don't know the range of the number and they're sporadically spread out, we can use a dictionary, however, not in the same way as HashSet, each BitVector32 will give you 32 masks instead of 1Illusion
What confused me is that you claim the memory usage is n/32, while it is maxValue/32, which is completely different (btw, your bool[] example still has the same problem). Furthermore, vector[5] does not give you whether the fifth bit is set, vector[16] will. vector[5] will give you whether the first AND third bit are set. I.e. you should enter the mask variable from your algorithm. I.e. bitVectorArray[(list[i] - 1)/32][mask] instead of bitVectorArray[(list[i] - 1)/32][i] & mask == maskMarcie
You're right on the mistakes. I will update the code so it will actually complie and run when i have a chance. But i think at least it looks like we are talking about the same thing and we're on the same page now, right?Illusion

© 2022 - 2024 — McMap. All rights reserved.