Counting bits set in a .Net BitArray Class
Asked Answered
D

11

29

I am implementing a library where I am extensively using the .Net BitArray class and need an equivalent to the Java BitSet.Cardinality() method, i.e. a method which returns the number of bits set. I was thinking of implementing it as an extension method for the BitArray class. The trivial implementation is to iterate and count the bits set (like below), but I wanted a faster implementation as I would be performing thousands of set operations and counting the answer. Is there a faster way than the example below?

count = 0;

for (int i = 0; i < mybitarray.Length; i++)
{

  if (mybitarray [i])
    count++;
}
Diorite answered 21/2, 2011 at 6:59 Comment(4)
As a sidenote I'll add that taking the BitArray code from Mono and adding a Cardinality that is O(1) is novice level programming. (the class libraries are under X11 license, that is a very very permissive license)Crag
Interesting suggestion. Won't the source be in C? In which case, I would need to make my library unmamaged? Also can you pls point me to the correct path on github?Diorite
No no... 95% (it's a random number) of the framework library (and of the mono library) are written in C# (pure C#, not C# + managed C++). Only the lowest level things are written in C (or something else) (I hadn't noticed you had asked me... You (and I, because 50% of times I forget) should remember to @name the person you want to write to :-) )Crag
Related posts - How to count the number of set bits in a 32-bit integer? & What is the fastest way to count set bits in UInt32Calondra
G
39

This is my solution based on the "best bit counting method" from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel

public static Int32 GetCardinality(BitArray bitArray)
{

    Int32[] ints = new Int32[(bitArray.Count >> 5) + 1];

    bitArray.CopyTo(ints, 0);

    Int32 count = 0;

    // fix for not truncated bits in last integer that may have been set to true with SetAll()
    ints[ints.Length - 1] &= ~(-1 << (bitArray.Count % 32));

    for (Int32 i = 0; i < ints.Length; i++)
    {

        Int32 c = ints[i];

        // magic (http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)
        unchecked
        {
        c = c - ((c >> 1) & 0x55555555);
        c = (c & 0x33333333) + ((c >> 2) & 0x33333333);
        c = ((c + (c >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
        }

        count += c;

    }

    return count;

}

According to my tests, this is around 60 times faster than the simple foreach loop and still 30 times faster than the Kernighan approach with around 50% bits set to true in a BitArray with 1000 bits. I also have a VB version of this if needed.

Godbey answered 16/1, 2013 at 8:46 Comment(0)
W
4

you can accomplish this pretty easily with Linq

BitArray ba = new BitArray(new[] { true, false, true, false, false });
var numOnes = (from bool m in ba
           where m
           select m).Count();
Watchdog answered 21/2, 2011 at 7:8 Comment(1)
If using LINQ, a one liner variant of the above: ba.Cast<bool>().Count(l => l). In the end, this is just a foreach loop in disguise.Hallock
T
2
BitArray myBitArray = new BitArray(...

int
    bits = myBitArray.Count,
    size = ((bits - 1) >> 3) + 1,
    counter = 0,
    x,
    c;

    byte[] buffer = new byte[size];
    myBitArray.CopyTo(buffer, 0);

    for (x = 0; x < size; x++)
        for (c = 0; buffer[x] > 0; buffer[x] >>= 1)
            counter += buffer[x] & 1;

Taken from "Counting bits set, Brian Kernighan's way" and adapted for bytes. I'm using it for bit arrays of 1 000 000+ bits and it's superb.
If your bits are not n*8 then you can count the mod byte manually.

Thermolysis answered 15/12, 2011 at 14:14 Comment(0)
F
2

I wrote my version of after not finding one that uses a look-up table:

private int[] _bitCountLookup;
private void InitLookupTable()
{
    _bitCountLookup = new int[256];

    for (var byteValue = 0; byteValue < 256; byteValue++)
    {
        var count = 0;
        for (var bitIndex = 0; bitIndex < 8; bitIndex++)
        {
            count += (byteValue >> bitIndex) & 1;
        }
        _bitCountLookup[byteValue] = count;
    }
}

private int CountSetBits(BitArray bitArray)
{
    var result = 0;
    var numberOfFullBytes = bitArray.Length / 8;
    var numberOfTailBits = bitArray.Length % 8;
    var tailByte = numberOfTailBits > 0 ? 1 : 0;
    var bitArrayInBytes = new byte[numberOfFullBytes + tailByte];
    bitArray.CopyTo(bitArrayInBytes, 0);

    for (var i = 0; i < numberOfFullBytes; i++)
    {
        result += _bitCountLookup[bitArrayInBytes[i]];
    }

    for (var i = (numberOfFullBytes * 8); i < bitArray.Length; i++)
    {
        if (bitArray[i])
        {
            result++;
        }
    }
    return result;
}
Fazio answered 16/9, 2014 at 13:41 Comment(0)
W
2

I had the same issue, but had more than just the one Cardinality method to convert. So, I opted to port the entire BitSet class. Fortunately it was self-contained.

Here is the Gist of the C# port.

I would appreciate if people would report any bugs that are found - I am not a Java developer, and have limited experience with bit logic, so I might have translated some of it incorrectly.

Winker answered 3/12, 2014 at 5:7 Comment(0)
T
2

Faster and simpler version than the accepted answer thanks to the use of System.Numerics.BitOperations.PopCount

C#

Int32[] ints = new Int32[(bitArray.Count >> 5) + 1];
bitArray.CopyTo(ints, 0);
Int32 count = 0;
for (Int32 i = 0; i < ints.Length; i++) {
    count += BitOperations.PopCount(ints[i]);
}
Console.WriteLine(count);

F#

let ints = Array.create ((bitArray.Count >>> 5) + 1) 0u
bitArray.CopyTo(ints, 0)
ints
|> Array.sumBy BitOperations.PopCount
|> printfn "%d"

See more details in Is BitOperations.PopCount the best way to compute the BitArray cardinality in .NET?

Thaumaturgy answered 24/4, 2021 at 23:34 Comment(1)
This answer works well, except that BitOperations.PopCount requires a UInt32, not Int32. Just change the first line to be a UInt32, and it works great.Ritter
C
1

You could use Linq, but it would be useless and slower:

var sum = mybitarray.OfType<bool>().Count(p => p);
Crag answered 21/2, 2011 at 7:8 Comment(3)
That is just the long way of writing what I wrote. They translate into the exact same thing. The runtime is identical, so where is your argument against Linq?Watchdog
You are counting on the fact that everything will be optimized... You can't count on it. In older versions of .net there were different speeds for foreach and for (for arrays). I haven't benchmarked what it's faster between the IEnumerable interface and the [] accessor, but "normally" linq is slower (because some methods aren't always inlined, while the OP code will always be "inlined" because it's already inline). You are right, it isn't useless, it is only "not really usefull". It seems an exercise in linq (like an exercise in elegance).Crag
Yes, I can use linq (either of the methods) but both are slower than my For loop (in the case of a bitarray) and will be an O(n) operation anyway.Diorite
S
1

There is no faster way with using BitArray - What it comes down to is you will have to count them - you could use LINQ to do that or do your own loop, but there is no method offered by BitArray and the underlying data structure is an int[] array (as seen with Reflector) - so this will always be O(n), n being the number of bits in the array.

The only way I could think of making it faster is using reflection to get a hold of the underlying m_array field, then you can get around the boundary checks that Get() uses on every call (see below) - but this is kinda dirty, and might only be worth it on very large arrays since reflection is expensive.

public bool Get(int index)
{
    if ((index < 0) || (index >= this.Length))
    {
        throw new ArgumentOutOfRangeException("index", Environment.GetResourceString("ArgumentOutOfRange_Index"));
    }
    return ((this.m_array[index / 0x20] & (((int) 1) << (index % 0x20))) != 0);
}

If this optimization is really important to you, you should create your own class for bit manipulation, that internally could use BitArray, but keeps track of the number of bits set and offers the appropriate methods (mostly delegate to BitArray but add methods to get number of bits currently set) - then of course this would be O(1).

Subtend answered 21/2, 2011 at 7:14 Comment(3)
If optimization is really important to you, I'd say you should be taking an int and twiddling it yourself rather than messing with a BitArray at all ;)Termagant
My own wrapper class would work as you suggest if I wanted to count the bits set after creating the class instance. But I am using it for intersection and then counting the bits in the result (bresult = b1.And(b2)). But your reflection concept gave me an idea. I looked deeper and saw that the class has a private property _version which seems to have the count. The only way I can think of getting it is using reflection. So let me check if that is faster than my direct loop.Diorite
@Sam: I think _version just is the number of changes performed on this BitArray instance.Subtend
T
1

If you really want to maximize the speed, you could pre-compute a lookup table where given a byte-value you have the cardinality, but BitArray is not the most ideal structure for this, since you'd need to use reflection to pull the underlying storage out of it and operate on the integral types - see this question for a better explanation of that technique.

Another, perhaps more useful technique, is to use something like the Kernighan trick, which is O(m) for an n-bit value of cardinality m.

static readonly ZERO = new BitArray (0);
static readonly NOT_ONE = new BitArray (1).Not ();

public static int GetCardinality (this BitArray bits)
{
    int c = 0;
    var tmp = new BitArray (myBitArray);

    for (c; tmp != ZERO; c++)
        tmp = tmp.And (tmp.And (NOT_ONE));

    return c;
}

This too is a bit more cumbersome than it would be in say C, because there are no operations defined between integer types and BitArrays, (tmp &= tmp - 1, for example, to clear the least significant set bit, has been translated to tmp &= (tmp & ~0x1).

I have no idea if this ends up being any faster than naively iterating for the case of the BCL BitArray, but algorithmically speaking it should be superior.


EDIT: cited where I discovered the Kernighan trick, with a more in-depth explanation

Termagant answered 21/2, 2011 at 7:37 Comment(1)
Your code tmp = tmp.And (tmp.And (NOT_ONE)); does not seem to work. Performing an And between tmp and NOT_ONE would result in the least significant bit of tmp being set to 0, all others would stay the same. Performing an and between tmp and tmp0 (where tmp0 has least bit set to 0) would result in tmp0, since 1 and 1 is 1 and 0 and anything is 0. This will result in the first iteration setting the least significant bit to 0, but all other iterations would not do anything (unless I am miss understanding something).Collogue
B
1

If you don't mind to copy the code of System.Collections.BitArray to your project and Edit it,you can write as fellow: (I think it's the fastest. And I've tried use BitVector32[] to implement my BitArray, but it's still so slow.)

    public void Set(int index, bool value)
    {
        if ((index < 0) || (index >= this.m_length))
        {
            throw new ArgumentOutOfRangeException("index", "Index Out Of Range");
        }
        SetWithOutAuth(index,value);
    }
    //When in batch  setting values,we need one method that won't auth the index range
    private void SetWithOutAuth(int index, bool value) 
    {
        int v = ((int)1) << (index % 0x20);
        index = index / 0x20;
        bool NotSet = (this.m_array[index] & v) == 0;
        if (value && NotSet)
        {
            CountOfTrue++;//Count the True values
            this.m_array[index] |= v;
        }
        else if (!value && !NotSet)
        {
            CountOfTrue--;//Count the True values
            this.m_array[index] &= ~v;
        }
        else 
            return;
        this._version++;
    }

    public int CountOfTrue { get; internal set; }

    public void BatchSet(int start, int length, bool value)
    {
        if (start < 0 || start >= this.m_length || length <= 0)
            return;
        for (int i = start; i < length && i < this.m_length; i++)
        {
            SetWithOutAuth(i,value);
        }
    }
Becalmed answered 4/11, 2011 at 3:10 Comment(0)
C
0

The problem is naturally O(n), as a result your solution is probably the most efficient.

Since you are trying to count an arbitrary subset of bits you cannot count the bits when they are set (would would provide a speed boost if you are not setting the bits too often).

You could check to see if the processor you are using has a command which will return the number of set bits. For example a processor with SSE4 could use the POPCNT according to this post. This would probably not work for you since .Net does not allow assembly (because it is platform independent). Also, ARM processors probably do not have an equivalent.

Probably the best solution would be a look up table (or switch if you could guarantee the switch will compiled to a single jump to currentLocation + byteValue). This would give you the count for the whole byte. Of course BitArray does not give access to the underlying data type so you would have to make your own BitArray. You would also have to guarantee that all the bits in the byte will always be part of the intersection which does not sound likely.

Another option would be to use an array of booleans instead of a BitArray. This has the advantage not needing to extract the bit from the others in the byte. The disadvantage is the array will take up 8x as much space in memory meaning not only wasted space, but also more data push as you iterate through the array to perform your count.

The difference between a standard array look up and a BitArray look up is as follows:
Array:

  1. offset = index * indexSize
  2. Get memory at location + offset and save to value

BitArray:

  1. index = index/indexSize
  2. offset = index * indexSize
  3. Get memory at location + offset and save to value
  4. position = index%indexSize
  5. Shift value position bits
  6. value = value and 1

With the exception of #2 for Arrays and #3 most of these commands take 1 processor cycle to complete. Some of the commands can be combined into 1 command using x86/x64 processors, though probably not with ARM since it uses a reduced set of instructions.
Which of the two (array or BitArray) perform better will be specific to your platform (processor speed, processor instructions, processor cache sizes, processor cache speed, amount of system memory (Ram), speed of system memory (CAS), speed of connection between processor and RAM) as well as the spread of indexes you want to count (are the intersections most often clustered or are they randomly distributed).

To summarize: you could probably find a way to make it faster, but your solution is the fastest you will get for your data set using a bit per boolean model in .NET.

Edit: make sure you are accessing the indexes you want to count in order. If you access indexes 200, 5, 150, 151, 311, 6 in that order then you will increase the amount of cache misses resulting in more time spent waiting for values to be retrieved from RAM.

Collogue answered 11/4, 2012 at 22:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.