Bit Array Equality
Asked Answered
S

4

6

I need something a little more than the System.Collections.BitArray class in my application. Specifically, I need the bit array:

  • To be immutable
  • To implement equality using value semantics

I created my own struct, largely copying the internals of the BitArray implementation. (Thanks, .Net Reflector!)

I don't deal everyday with bitwise operations, so I don't have the highest degree of confidence in my equality implementation. (It's passing the unit tests I am throwing at it, but I may be missing edge cases.) I have my proposed solutions as answers below. I would appreciate others' feedback and answers for something that may be more correct or efficient.

Just like the CLR BitArray, the length field refers to the number of bits in the struct and the array field (or Array property) refers to the 32-bit integer array that represents the bits.

[CLARIFICATION] I have chosen to take the easy route in my constructors and other methods so that I cannot rely on the unnecessary bits being zeros. For example,

  • Not() is implemented by bitwise negation (~) on the integer array elements.
  • A constructor is available that takes a length and boolean to initialize all bits. If the initialization value is true, I set all elements of the int array to -1 (in two's complement, represented by all 1's)
  • Etc.

Thus, I need to handle (or, rather, ignore) them in the comparison. A fine solution would also be to keep those bits zeroed at all times, but in my situation that will result in more work (both for the computer and me!)

Shiver answered 23/2, 2010 at 18:35 Comment(1)
What is the type of your Array member?Lynnalynne
P
2

Update: my original analysis below was incorrect...

Unfortunately, I was incorrect about the behavior of << 32 - C# enforces that the left-shift operator will restrict the number of shift to the lower 5 bits of the right operand (6 bits for a shift involving a 64-bit left operand). So your original code was both well-defined and correct in C# (it is undefined behavior in C/C++). Essentially, this shift expression:

(this.Array[i] << shift)

is equivalent to:

(this.Array[i] << (shift & 0x1f))

I'd probably still change the shift to make that explicit (if for no other reason that when I looked at that code 6 months later I wouldn't stumble through the same mistaken analysis) using the above instead of the if (shift == 32) check.

The original analysis:


OK, so here's a second answer. Most importantly, I think that you original solution has a bug in the case where the bit-length of your ImmutableBitArray is a multiple of 32 bits you'll return true for 2 arrays that differ in the last Int32[] array element.

For example, consider ImmutableBitArrays with a bit length of 32 bits that are different. The original Equals() method will perform the shift operation on the one and only Int32 in the array - but it'll shift the values 32 bits, since

int shift = 0x20 - (this.length % 0x20);

will evaluate to 32.

That means the next test:

if (this.Array[i] << shift != other.Array[i] << shift)

Will test for (0 != 0) and therefore the return false won't be executed.

I'd change your Equals() method to something like the following, which isn't a major change - I think it takes care of the above mentioned bug and changes a couple other things that are strictly style-related so may not have any interest to you. Also note that I haven't actually compiled and test my Equals() method, so there's a nearly 100% chance that there's a bug (or at least a syntax error):

public bool Equals(ImmutableBitArray other)
{
    if (this.length != other.length)
    {
        return false;
    }

    int finalIndex = this.Array.Length - 1;

    for (int i = 0; i < finalIndex; i++)
    {
        if (this.Array[i] != other.Array[i])
        {
            return false;
        }
    }

    // check the last array element, making sure to ignore padding bits

    int shift = 32 - (this.length % 32);
    if (shift == 32) {
        // the last array element has no padding bits - don't shift
        shift = 0;
    }

    if (this.Array[finalIndex] << shift != other.Array[finalIndex] << shift)
    {
        return false;
    }

    return true;
}

Note that strictly speaking, the original GetHashCode() method isn't bugged even though it has the same flaw, because even if you don't properly mix in the last element when the bit-length is a multiple of 32, equal object would still return the same hashcode. But I'd still probably decide to address the flaw in the same way in GetHashCode().

Plumage answered 24/2, 2010 at 7:31 Comment(3)
+1 Excellent catch! I was wondering why my unit test on this scenario passed and discovered that (in C# at least), n << 32 == n. It seems that the compiler realizes you are doing something invalid (shifting larger than the size of the struct) and corrects your mistake. But this is probably undefined behavior and just a fortuitous reality. I corrected my shift function.Shiver
In my case, I am allowing zero-length bitarrays, so some of the stylistic recommendations will fail. But obviously I didn't mention that in my question, so you had no way of knowing that. Thanks for the advice!Shiver
@Dave: my description of what I thought was a bug was incorrect - C# defines the left shift operator to do what your code needed. I've added an errata to the beginning of the answer...Plumage
P
2

If in the constructor of the ImmutableBitArray the unused 'padding bits' on the last element are forced to zero you don't need to jump through hoops to only check the valid bits in the last element, as the padding will be the same in equal instances.

That'll simplify the Equals() and GetHashCode() methods nicely:

public bool Equals(ImmutableBitArray other)
{
    if (this.length != other.length)
    {
        return false;
    }

    for (int i = 0; i < this.Array.Length; i++)
    {
        if (this.Array[i] != other.Array[i])
        {
            // since padding bits are forced to zero in the constructor,
            //  we can test those for equality just as well and the valid
            //  bits
            return false;
        }
    }

    return true;
}


public override int GetHashCode()
{
    int hc = this.length;

    for (int i = 0; i < this.Array.Length; i++)
    {
        // since padding bits are forced to zero in the constructor,
        //  we can mix those into the hashcode no problem

        hc ^= this.Array[i];
    }

    return hc;
}
Plumage answered 23/2, 2010 at 18:46 Comment(2)
+1 Excellent idea. I started down that road, actually. But it also complicates some other functions. For example, the easiest way to do Not() is to just do result.Array[i] = ~this.Array[i] for each element in the int array, which would leave 1's in the zeroed parts. I had to deal with the excess somewhere and in my case it is simplest to do it at the point of comparison.Shiver
Yup - I guess you gotta pay somewhere.Plumage
P
2

Update: my original analysis below was incorrect...

Unfortunately, I was incorrect about the behavior of << 32 - C# enforces that the left-shift operator will restrict the number of shift to the lower 5 bits of the right operand (6 bits for a shift involving a 64-bit left operand). So your original code was both well-defined and correct in C# (it is undefined behavior in C/C++). Essentially, this shift expression:

(this.Array[i] << shift)

is equivalent to:

(this.Array[i] << (shift & 0x1f))

I'd probably still change the shift to make that explicit (if for no other reason that when I looked at that code 6 months later I wouldn't stumble through the same mistaken analysis) using the above instead of the if (shift == 32) check.

The original analysis:


OK, so here's a second answer. Most importantly, I think that you original solution has a bug in the case where the bit-length of your ImmutableBitArray is a multiple of 32 bits you'll return true for 2 arrays that differ in the last Int32[] array element.

For example, consider ImmutableBitArrays with a bit length of 32 bits that are different. The original Equals() method will perform the shift operation on the one and only Int32 in the array - but it'll shift the values 32 bits, since

int shift = 0x20 - (this.length % 0x20);

will evaluate to 32.

That means the next test:

if (this.Array[i] << shift != other.Array[i] << shift)

Will test for (0 != 0) and therefore the return false won't be executed.

I'd change your Equals() method to something like the following, which isn't a major change - I think it takes care of the above mentioned bug and changes a couple other things that are strictly style-related so may not have any interest to you. Also note that I haven't actually compiled and test my Equals() method, so there's a nearly 100% chance that there's a bug (or at least a syntax error):

public bool Equals(ImmutableBitArray other)
{
    if (this.length != other.length)
    {
        return false;
    }

    int finalIndex = this.Array.Length - 1;

    for (int i = 0; i < finalIndex; i++)
    {
        if (this.Array[i] != other.Array[i])
        {
            return false;
        }
    }

    // check the last array element, making sure to ignore padding bits

    int shift = 32 - (this.length % 32);
    if (shift == 32) {
        // the last array element has no padding bits - don't shift
        shift = 0;
    }

    if (this.Array[finalIndex] << shift != other.Array[finalIndex] << shift)
    {
        return false;
    }

    return true;
}

Note that strictly speaking, the original GetHashCode() method isn't bugged even though it has the same flaw, because even if you don't properly mix in the last element when the bit-length is a multiple of 32, equal object would still return the same hashcode. But I'd still probably decide to address the flaw in the same way in GetHashCode().

Plumage answered 24/2, 2010 at 7:31 Comment(3)
+1 Excellent catch! I was wondering why my unit test on this scenario passed and discovered that (in C# at least), n << 32 == n. It seems that the compiler realizes you are doing something invalid (shifting larger than the size of the struct) and corrects your mistake. But this is probably undefined behavior and just a fortuitous reality. I corrected my shift function.Shiver
In my case, I am allowing zero-length bitarrays, so some of the stylistic recommendations will fail. But obviously I didn't mention that in my question, so you had no way of knowing that. Thanks for the advice!Shiver
@Dave: my description of what I thought was a bug was incorrect - C# defines the left shift operator to do what your code needed. I've added an errata to the beginning of the answer...Plumage
I
2

After several hours of searching and study, I finally got my answer and would like to share. I have not reviewed the performance as I only care readability.

if (input1.length != input2.length)
{
    return false;
}

var result = new BitArray(input1);
result = result.Xor(input2);

if (result.Cast<bool>().Contains(true))
    return false;
return true;
Incommunicative answered 31/7, 2014 at 2:3 Comment(1)
I was also thinking about the xor operator. But this would be useful only if we could somehow get the underlying bytes and compare to 0 and not iterate through each bit.Zusman
S
1

The equality method:

public bool Equals(ImmutableBitArray other)
{
    if (this.length != other.length)
    {
        return false;
    }

    for (int i = 0; i < this.Array.Length; i++)
    {
        if (this.Array[i] != other.Array[i])
        {
            // This does not necessarily mean that the relevant bits of the integer arrays are different.
            // Is this before the last element in the integer arrays?
            if (i < this.Array.Length - 1)
            {
                // If so, then the objects are not equal.
                return false;
            }

            // If this is the last element in the array we only need to be concerned about the bits
            // up to the length of the bit array.
            int shift = 0x20 - (this.length % 0x20);
            if (this.Array[i] << shift != other.Array[i] << shift)
            {
                return false;
            }
        }
    }

    return true;
}

And the necessary GetHashCode override:

public override int GetHashCode()
{
    int hc = this.length;

    for (int i = 0; i < this.Array.Length; i++)
    {
        if (i < this.Array.Length - 1)
        {
            hc ^= this.Array[i];
        }
        else
        {
            int shift = 0x20 - (this.length % 0x20);
            hc ^= this.Array[this.Array.Length - 1] << shift;
        }
    }

    return hc;
}
Shiver answered 23/2, 2010 at 18:37 Comment(3)
You don't have to treat the last byte specially, the high bits will be zero.Lasagne
@nobugz: The way I am creating the instances, that will not always be correct. (See my comment to Michael Burr below; also see the ctor(int, bool) for BitArray; I am doing the same thing using -1.) But maybe that is the easier way to go.Shiver
I think there's actually a subtle bug here when the bit-length is a multiple of 32. See my 2nd answer (not an edit of the first) for details: https://mcmap.net/q/1732274/-bit-array-equality/…Plumage

© 2022 - 2024 — McMap. All rights reserved.