Create a hashcode of two numbers
Asked Answered
P

7

70

I am trying to create a quick hashcode function for a complex number class (a + b) in C#.

I have seen repeatedly the a.GetHashcode()^b.GetHashCode() method. But this will give the same hashcode for (a,b) and (b,a).

Are there any standard algorithm to do this and are there any functions in the .Net framework to help?

Poetaster answered 21/5, 2009 at 12:6 Comment(1)
#682938Indubitable
E
94

My normal way of creating a hashcode for an arbitrary set of hashable items:

int hash = 23;
hash = hash * 31 + item1Hash;
hash = hash * 31 + item2Hash;
hash = hash * 31 + item3Hash;
hash = hash * 31 + item4Hash;
hash = hash * 31 + item5Hash;
// etc

In your case item1Hash could just be a, and item2Hash could just be b.

The values of 23 and 31 are relatively unimportant, so long as they're primes (or at least coprime).

Obviously there will still be collisions, but you don't run into the normal nasty problems of:

hash(a, a) == hash(b, b)
hash(a, b) == hash(b, a)

If you know more about what the real values of a and b are likely to be you can probably do better, but this is a good initial implementation which is easy to remember and implement. Note that if there's any chance that you'll build the assembly with "check for arithmetic overflow/underflow" ticked, you should put it all in an unchecked block. (Overflow is fine for this algorithm.)

Eberly answered 21/5, 2009 at 12:12 Comment(9)
"so long as they're primes (or at least coprime)" - I'd have thought that the initial state can be anything (if multiples of 31 are bad, then what happens when you hit such a value by chance part way through the calculation?). Then the multiplier just needs to be odd (to avoid early values having no effect on the low bits of the result). And not 1, to avoid commutativity. Am I completely missing the point?Cappuccino
I wouldn't like to say I've followed the logic for why the numbers should be co-prime - but that's the advice I've always seen given for this pattern by people wiser than myself (such as Josh Bloch).Eberly
Can't blame you for doing what your boss's boss's boss tells you ;-) Here's him writing a hashCode method with initial state 0, although only as part of an example illustrating something completely different: 209.85.229.132/search?q=cache:3H-Bb8E4sDEJ:developers.sun.com/…. Maybe I should just read Effective Java...Cappuccino
For the OP's specific case of 2 integer values, this is only an optimal solution if all values are equally probable. If the values are skewed toward, say, smaller numbers (e.g. autogenerated primary key values), this solution is more likely to produce collisions than the answer from @Noldorin.Thach
Why not starting with int hash = 713 + item1Hash; instead since you will always be multiplying 23 with 31 ?Insure
@Winter: For simplicity of reading - it keeps everything consistent.Eberly
I think it shouldn't be the best answer because when hashing 3 ints vector (x,y,z) it will collide very soon (0, 1, 0) and (0, 0, 31) have the same hash 685224Dunaville
@Herrgott: Note:L "If you know more about what the real values of a and b are likely to be you can probably do better". Your example falls into that situation IMO.Eberly
I stumbled upon this while looking up how to do this very thing. I then saw one of those VS Quick Tips that was actually helpful and pointed me to the fact that System.HashCode.Combine now exists for .Net Core 2.1 and up. It looks to be overloaded to take up to 7 generic parameters and combines them for you.Sewellel
S
19

Here's a possible approach that takes into account order. (The second method is defined as an extension method.)

public int GetHashCode()
{
    return a.GetHashcode() ^ b.GetHashcode().RotateLeft(16);
}

public static uint RotateLeft(this uint value, int count)
{
    return (value << count) | (value >> (32 - count))
}

It would certainly be interesting to see how the Complex class of .NET 4.0 does it.

Servo answered 21/5, 2009 at 12:11 Comment(4)
This is the best answer if the integer values are skewed, e.g. if they tend to be on the small side because they are autogenerated primary keys in a database. The call to a.GetHashCoce() and b.GetHashCode() is not necessary as it will just return the value of a and b respectively (I believe this is a current implementation detail rather than documented behavior).Thach
The calls to GetHashCode() certainly are needed if a and b are anything other than int (such as uint) because of the return type of GetHashCode() on the containing class.Singlehandedly
Nice solution; I like the fact that this keeps everything ordered and predictable. This seems like a 'correct and complete' solution. One comment though: the extension method didn't work, in the sense that the compiler wasn't smart enough to coerce the int value of GetHashCode() to a uint. I didn't want to do the edits to your code, cuz it seemed like it would just add noise.Stroll
in case you are interested, here is the source code of Complex.GetHashCode()Geisha
P
12

One standard way is this:

hashcode = 23
hashcode = (hashcode * 37) + v1
hashcode = (hashcode * 37) + v2

23 and 37 are coprime, but you can use other numbers as well.

Pleuropneumonia answered 21/5, 2009 at 12:12 Comment(3)
This algorithm is simple to implement, but beware that you can easily have collisions with it: (v1=2, v2=1) will collide with (v1=1, v2=38) for instanceThorough
Not going to work if v1 and v2 and not integers such as a Vector2 position type. Hashcode must be int unless you make do with truncation?Halsy
@WDUK, if v1 and v2 are complext types, simply substitute v1 and v2 in those expressions with v1.GetHashCode() and v2.GetHashCode(). GetHashCode is already a truncated result for most non-trivial types.Pleuropneumonia
Z
5

What about this:

(a.GetHashcode() + b).GetHashcode()

Gives you a different code for (a,b) and (b,a) plus it's not really that fancy.

Zeringue answered 21/5, 2009 at 12:9 Comment(2)
It's not always correct. For Int32s, x.GetHashCode() just returns x. So (a.GetHasCode() + b).GetHashCode() is just a + b.Selfsustaining
In the case when a and b are Int32.Selfsustaining
D
5

@JonSkeet gives a fair, general-purpose algorithm for computing a hash code from n hash codes but assumes you already know which members of an object need to be hash, know what to do about null members, and ommits an implementation for n arbitrary items. So we expand upon his answer:

  1. Only public, immutable properties and fields should contribute to an objects hash code. They should be public (or isomorphic to the public) since we should be able to count on two objects with the same visible surface having the same hash code (hinting towards relationship between object equality and hash code equality), and they should be immutable since an object's hash code should never change in its life time (since then you might end up with an object in the wrong slot of a hash table!).
  2. null members should hash as a constant, such as 0
  3. @JonSkeet's algorithm is a text-book example for applying the functional programming higher-order function usually called fold (Aggregate in C# LINQ), where 23 is our seed and <hash accumulator> * 31 + <current item hash> is our folding function:

In F#

let computeHashCode items =
    items
    |> Seq.map (fun item -> if item = null then 0 else item.GetHashCode())
    |> Seq.fold (fun hash itemHash -> hash * 31 + itemHash) 23

In C#

Func<IEnumerable<Object>, int> computeHashCode = items =>
    items
    .Select(item => item == null ? 0 : item.GetHashCode())
    .Aggregate(23, (hash, itemHash) => hash * 31 + itemHash);
Decastyle answered 20/5, 2013 at 14:6 Comment(0)
D
1

All that depends on what you're trying to achieve. If hashes are meant for hash structures like Dictionary, then you have to balance collision rate and speed of hashing. To have a perfect hash without collision at all it will be more time consuming. Similarly the fastest hashing algorithm will have more collisions relatively. Finding the perfect balance is the key here. Also you should take into consideration how large your effective hash can be, and if hashing should be reversible! Noldorin's approach gives you perfect hash (read no collision) if your real and imaginary parts of your complex number are always positive. This will do even for negative numbers if you're ok with the rare collisions. But I'm concerned over the range of values it can yield, quite big for my taste.

If you're after perfect hashes (out of some academic/research interests) that should work even for negative numbers, you can see this solution (and an array of other solutions in the same thread). In my tests, it is faster and utilizes space better than any other I have seen.

Dormer answered 15/12, 2012 at 8:31 Comment(0)
F
0

A stupid simple solution: This has worked for me in the past, but am not sure how GetHashCode() works.

return (a.ToString("0") + b.ToString("0")).GetHashCode();

In my case I wanted to implement an override of equals for custom class that was a member's first name and date of birth. I am pretty sure I could get away with not overriding GetHashCode, but I find the warning to be annoying. Here is my code (class: Member):

public override bool Equals(object obj)
    {
        Member that = (Member)obj;
        return this.memberFirstName == that.memberFirstName 
            && this.MemberDob.Date.ToString("yyyyMMdd") == that.MemberDob.Date.ToString("yyyyMMdd");
    }
    public override int GetHashCode()
    {
        return (this.memberFirstName + MemberDob.Date.ToString("yyyyMMdd")).GetHashCode();
    }
Franctireur answered 7/7, 2022 at 15:57 Comment(1)
this will give a LOT of collisions if a and b give same concatenated results... for example [1 and 12], [11 and 2]. [37 and 55], [3 and 755]. On top of that this solution is eons slower compared to bit manipulation because of stringsGilliangilliard

© 2022 - 2024 — McMap. All rights reserved.