What is an appropriate `GetHashCode()` algorithm for a 2D point struct (avoiding clashes)
Asked Answered
A

5

18

Consider the following code:

struct Vec2 : IEquatable<Vec2>
{
    double X,Y;

    public bool Equals(Vec2 other)
    {
        return X.Equals(other.X) && Y.Equals(other.Y);
    }

    public override bool Equals(object obj)
    {
        if (obj is Vec2)
        {
            return Equals((Vec2)obj);
        }
        return false;
    }

    // this will return the same value when X, Y are swapped
    public override int GetHashCode()
    {
        return X.GetHashCode() ^ Y.GetHashCode();
    }

}

Beyond the conversation of comparing doubles for equality (this is just demo code), what I am concerned with is that there is a hash clash when X, Y values are swapped. For example:

Vec2 A = new Vec2() { X=1, Y=5 };
Vec2 B = new Vec2() { X=5, Y=1 };

bool test1 = A.Equals(B);  // returns false;
bool test2 = A.GetHashCode() == B.GetHashCode() // returns true !!!!!

which should wreck havoc in a dictionary collection. So the question is how to property form the GetHashCode() function for 2,3 or even 4 floating point values such that the results are not symmetric and the hashes don't clash.

Edit 1:

Point implements the inappropriate x ^ y solution, and PointF wraps ValueType.GetHashCode().

Rectangle has a very peculiar (((X ^ ((Y << 13) | (Y >> 19))) ^ ((Width << 26) | (Width >> 6))) ^ ((Height << 7) | (Height >> 25))) expression for the hash code, which seems to perform as expected.

Edit 2:

'System.Double' has a nice implementation as it does not consider each bit equally important

public override unsafe int GetHashCode() //from System.Double
{
    double num = this;
    if (num == 0.0)
    {
        return 0;
    }
    long num2 = *((long*) &num);
    return (((int) num2) ^ ((int) (num2 >> 32)));
}
Accad answered 7/3, 2011 at 15:21 Comment(4)
It is fine to aim to minimise collisions, but your code must expect them; they will always happenKorman
Hashes will clash - int has a smaller range of possible values than double, and it's even smaller when compared to doublexdouble. Also, you might want to consider a looser equality comparison - due to rounding, two float values might be very close to one another (what anyone doing a comparison "by eye" would consider equal), but still won't be exactly equal.Coad
It's not just that (A,B) collides with (B,A). Thanks to the fact that X^X --> 0 all (C,C) collides with all (D,D), and that's a much larger space of collisions.Allomorphism
possible duplicate of Create a hashcode of two numbersSubkingdom
C
22

Jon skeet has this covered:

What is the best algorithm for an overridden System.Object.GetHashCode?

   public override int GetHashCode()
   {
       unchecked // Overflow is fine, just wrap
       {
           int hash = 17;
           // Suitable nullity checks etc, of course :)
           hash = hash * 23 + X.GetHashCode();
           hash = hash * 23 + Y.GetHashCode();
           return hash;
       }
   }

Also, change your Equals(object) implementation to:

return Equals(obj as FVector2);

Note however that this could perceive a derived type to be equal. If you don't want that, you'd have to compare the runtime type other.GetType() with typeof(FVector2) (and don't forget nullity checks) Thanks for pointing out it's a struct, LukH

Resharper has nice code generation for equality and hash code, so if you have resharper you can let it do its thing

Cordell answered 7/3, 2011 at 15:22 Comment(4)
The type in question is a struct so the Equals(object) implementation can't be changed to use as. There's nothing wrong with the OP's current Equals(object) method.Hotchkiss
This may work well, but if you look into the calculation for Double.GetHashCode() you will notice that a more careful consideration of important bits is used, and I would like to follow suit.Accad
Thanks for the reply, as I have implemented a version of this for many structures since.Accad
Glad to be of assistance !Cordell
A
7

Hash collisions don't wreak havoc in a dictionary collection. They'll reduce the efficiency if you're unlucky enough to get them, but dictionaries have to cope with them.

Collisions should be rare if at all possible, but they're don't mean the implementation is incorrect. XORs are often bad for the reasons you've given (high collisions) - ohadsc has posted a sample I gave before for an alternative, which should be fine.

Note that it would be impossible to implement Vec2 with no collisions - there are only 232 possible return values from GetHashCode, but there are rather more possible X and Y values, even after you've removed NaN and infinite values...

Eric Lippert has a recent blog post on GetHashCode which you may find useful.

Agamogenesis answered 7/3, 2011 at 15:27 Comment(2)
Mind you that when the fields are floating point numbers the bits are not dense, but follow certain patterns given reasonable numbers for the structure to hold. Hence the xor operation should suffice if it avoids symmetry in most significant digits. Does this make sense?Accad
@ja72: Possibly. It depends on the real world data. If they tend to actually be integers, it will still be nasty.Agamogenesis
M
1

What are reasonable bounds for the coordinates?

Unless it can be all possible integer values you could simply:

const SOME_LARGE_NUMBER=100000; return SOME_LARGE_NUMBER * x + y;

Millicent answered 7/3, 2011 at 15:33 Comment(1)
This is really the best if you know the bounds. You can use it for ints too with int.MaxValue with unchecked clauseSubkingdom
C
0

If size of your hash code is lesser than size of your struct, then clashes are inevitable anyways.

Cretonne answered 7/3, 2011 at 15:27 Comment(1)
But clashing with (X=1, Y=5) and (X=5, Y=1) for example is unacceptable. A clash with (3.287e308, -7,.228e175) and its symmetric pair is far more acceptable.Accad
T
0

The hash codes approach works for interger coordinates but is not recommended for floating point values. With floating point coordinates one can create a point-set/pool by using a sorted sequence structure.

A sorted sequence is a leaf version balanced binary tree.

Here the keys would be the point coordinates.

Toddle answered 8/1, 2015 at 2:46 Comment(8)
Can you show some code explaining what you mentioned. I am having a hard time visualizing it. I understand that floating point precision would make GetHashCode() non deterministic, but I do not understand how you propose to get around this.Accad
Forget about GetHashCode(), you don't use hashing. Just used a balanced binary tree with an xy-comparator.Toddle
I often need to implement IEquatable<> and thus hashing is required to override. Can you please show some code so we can understand what you are talking about.Accad
If you have to use GetHashCode then i cannot help you. In my understanding, one has two choices with dictionaries. Either binary search trees as a navigation structure to a list or hashing to entries in a list. Which one chooses is based on the type of keys you are working with. As you said, floating point does not work well when using hashing. Hashing is a less general approach than the one that uses a comparator with a search tree. Search for data structures called "sorted sequences" and look up leaf or exogenous versions of balanced binary trees.Toddle
Btw you can convert to integers using a snap grid then get hashing without collisions by using one of these functions. dmauro.com/post/77011214305/…Toddle
Here is a thread on which hash function is best for uniqueness and speed but it looks like they tested only strings. Maybe worth looking at. programmers.stackexchange.com/questions/49550/…. HoToddle
Since keys maybe any set of fixed length bit-strings, they maybe floating point numbers as well. With the minor modifications of complimenting the most significant bit for non-negative floats, complimenting all bits for negative floats, then these numbers can be simply dealt with since their order is preserved when their bit representation is interpreted lexicographically [IEEE 2008]Toddle
Mapping two integers to one, in a unique and deterministic way. #920112Toddle

© 2022 - 2024 — McMap. All rights reserved.