GetHashCode Extension Method
Asked Answered
A

9

12

After reading all the questions and answers on StackOverflow concerning overriding GetHashCode() I wrote the following extension method for easy and convenient overriding of GetHashCode():

public static class ObjectExtensions
{
    private const int _seedPrimeNumber = 691;
    private const int _fieldPrimeNumber = 397;
    public static int GetHashCodeFromFields(this object obj, params object[] fields) {
        unchecked { //unchecked to prevent throwing overflow exception
            int hashCode = _seedPrimeNumber;
            for (int i = 0; i < fields.Length; i++)
                if (fields[i] != null)
                    hashCode *= _fieldPrimeNumber + fields[i].GetHashCode();
            return hashCode;
        }
    }
}

(I basically only refactored the code that someone posted there, because I really like that it can be used generally)

which I use like this:

    public override int GetHashCode() {
        return this.GetHashCodeFromFields(field1, field2, field3);
    }

Do you see any problems with this code?

Adrianneadriano answered 18/4, 2009 at 16:34 Comment(4)
The code looks fine. As an improvement you could check if fields[i] is not null.Elodia
Just one addition: it's highly unrecommended to add extension methods to Object type.Sultana
I know, but why? We haven't actually changed object's functionality in any way. Your intelli-sense will only be cluttered. And there is also no need to make this an extension method, it just makes it more convienient. You could still just call the static method passing in the object.Adrianneadriano
I think that the problem is boxing/unboxing for example, if using structures.Ineffectual
C
3

That looks like a solid way to do it.

My only suggestion is that if you're really concerned about performance with it, you may want to add generic versions for several common cases (ie. probably 1-4 args). That way, for those objects (which are most likely to be small, key-style composite objects), you won't have the overhead of building the array to pass to the method, the loop, any boxing of generic values, etc. The call syntax will be exactly the same, but you'll run slightly more optimized code for that case. Of course, I'd run some perf tests over this before you decide whether it's worth the maintenance trade-off.

Something like this:

public static int GetHashCodeFromFields<T1,T2,T3,T4>(this object obj, T1 obj1, T2 obj2, T3 obj3, T4 obj4) {
    int hashCode = _seedPrimeNumber;
    if(obj1 != null)
        hashCode *= _fieldPrimeNumber + obj1.GetHashCode();
    if(obj2 != null)
        hashCode *= _fieldPrimeNumber + obj2.GetHashCode();
    if(obj3 != null)
        hashCode *= _fieldPrimeNumber + obj3.GetHashCode();
    if(obj4 != null)
        hashCode *= _fieldPrimeNumber + obj4.GetHashCode();
    return hashCode;
}
Curtain answered 18/4, 2009 at 19:3 Comment(1)
Thanks Jonathan, I think this would solve all the performance problems that you and others have pointed out. I also think the code generation solution was great but I like this more.Adrianneadriano
M
3

I wrote some stuff a little while back that you might solve your problem... (And actually, it could probably be improved to include the seed that you have...)

Anyway, the project is called Essence ( http://essence.codeplex.com/ ), and it uses the System.Linq.Expression libraries to generate (based on attributes) standard representations of Equals/GetHashCode/CompareTo/ToString, as well as being able to create IEqualityComparer and IComparer classes based on an argument list. (I also have some further ideas, but would like to get some community feedback before continuing too much further.)

(What this means is that it's almost as fast as being handwritten - the main one where it isn't is the CompareTo(); cause the Linq.Expressions doesn't have the concept of a variable in the 3.5 release - so you have to call CompareTo() on the underlying object twice when you don't get a match. Using the DLR extensions to Linq.Expressions solves this. I suppose I could have used the emit il, but I wasn't that inspired at the time.)

It's quite a simple idea, but I haven't seen it done before.

Now the thing is, I kind of lost interest in polishing it (which would have included writing an article for codeproject, documenting some of the code, or the like), but I might be persuaded to do so if you feel it would be something of interest.

(The codeplex site doesn't have a downloadable package; just go to the source and grab that - oh, it's written in f# (although all the test code is in c#) as that was the thing I was interested in learning.)

Anyway, here is are c# example from the test in the project:

    // --------------------------------------------------------------------
    // USING THE ESSENCE LIBRARY:
    // --------------------------------------------------------------------
    [EssenceClass(UseIn = EssenceFunctions.All)]
    public class TestEssence : IEquatable<TestEssence>, IComparable<TestEssence>
    {
        [Essence(Order=0] public int MyInt           { get; set; }
        [Essence(Order=1] public string MyString     { get; set; }
        [Essence(Order=2] public DateTime MyDateTime { get; set; }

        public override int GetHashCode()                                { return Essence<TestEssence>.GetHashCodeStatic(this); }
    ...
    }

    // --------------------------------------------------------------------
    // EQUIVALENT HAND WRITTEN CODE:
    // --------------------------------------------------------------------
    public class TestManual
    {
        public int MyInt;
        public string MyString;
        public DateTime MyDateTime;

        public override int GetHashCode()
        {
            var x = MyInt.GetHashCode();
            x *= Essence<TestEssence>.HashCodeMultiplier;
            x ^= (MyString == null) ? 0 : MyString.GetHashCode();
            x *= Essence<TestEssence>.HashCodeMultiplier;
            x ^= MyDateTime.GetHashCode();
            return x;
        }
    ...
    }

Anyway, the project, if anyone thinks is worthwhile, needs polishing, but the ideas are there...

Mou answered 11/7, 2009 at 9:48 Comment(1)
Nice, but definitely much slower than hand-written impl.Sultana
L
1

I looks pretty good to me, I only have one issue: It is a shame that you have to use an object[] to pass in the values as this will box any value types you send to the function. I don't think you have much of a choice though, unless you go the route of creating some generic overloads like others have suggested.

Leap answered 18/4, 2009 at 16:37 Comment(2)
IIRC a virtual function call on a valuetype will box the value.Garratt
only if the type does NOT supply an implementation for the methodColombo
G
0

On general principle you should scope your unchecked as narrowly as you reasonably can, though it doesn't matter much here. Other than that, looks fine.

Googolplex answered 18/4, 2009 at 16:40 Comment(0)
R
0
public override int GetHashCode() {
    return this.GetHashCodeFromFields(field1, field2, field3, this);
}

(yes, I'm very pedantic but this is the only problem that I see)

Reimer answered 18/4, 2009 at 16:47 Comment(3)
Wouldn't this cause an infinite loop?Adrianneadriano
sure, I don't even know how to fix thatReimer
if you're trying to protect your generic function against a stupid developer, I guess you could check that fields[i] != obj, but this is one case where I'd just let it crash and burn if the user is that stupid....Curtain
F
0

More optimal:

  1. Create a code generator that uses reflection to look through your business object fields and creates a new partial class which overrides GetHashCode() (and Equals()).
  2. Run the code generator when your program starts up in debug mode, and if the code has changed, exit with a message to the developer to recompile.

The advantages of this are:

  • Using reflection you know which fields are value types or not, and hence whether they need null checks.
  • There are no overheads - no extra function calls, no list construction, etc. This is important if you are doing lots of dictionary lookups.
  • Long implementations (in classes with lots of fields) are hidden in partial classes, away from your important business code.

Disadvantages:

  • Overkill if you don't do lots of dictionary lookups/calls to GetHashCode().
Florenceflorencia answered 19/4, 2009 at 1:33 Comment(0)
C
0

I should point out that you should almost never do allocation while implementing GetHashCode (here's some useful blog posts about it).

The way that params works (generating a new array on the fly) means this is really not a good general solution. You would be better using a method call per field and maintaiing the hash state as a variable passed to them (this makes it easy to use better hashing functions and avalanching too).

Colombo answered 19/4, 2009 at 9:22 Comment(3)
Thank you for the links Shuggy. Very interesting. I didn't know that the GetHashCode implementations of the WPF Pen, Font, Color and Brush were so 'heavy'. I am using these a lot but fortunately not as keys in my dictionaries.Adrianneadriano
They may have fixed this since that post... I have a few nifty functions for chaining hashes together (using Jenkins one at a time hash) I'll try to pop it in laterColombo
There's nothing wrong with a GetHashCode() function that allocates memory the first time it's called, nor is there anything wrong with one that takes some time to compute. Objects whose hash functions which allocate memory or take time to compute, however, should cache their hash values so that repeated requests will complete quickly.Sinuation
S
0

Apart from the problems arising from using params object[] fields, I think not using the type information may be a performance issue in some situations too. Suppose two classes A, B have the same type and number of fields and implement the same interface I. Now if you put A and B objects to a Dictionary<I, anything> objects with equal fields and different types will end up in the same bucket. I'd probably insert some statement like hashCode ^= GetType().GetHashCode();

Jonathan Rupp's accepted answer deals with params array but do not deal with boxing of value types. So, if performance is very important I'd probably declare GetHashCodeFromFields having not object but int parameters, and send not the fields themselves but the hash codes of the fields. i.e.

public override int GetHashCode() 
{
    return this.GetHashCodeFromFields(field1.GetHashCode(), field2.GetHashCode());
}
Sadi answered 25/12, 2011 at 10:49 Comment(0)
R
0

One problem that could arise is when multiplication hits 0, final hashCode is always 0, as I just experienced with an object with a lot of properties, in the following code :

hashCode *= _fieldPrimeNumber + fields[i].GetHashCode();

I'd suggest :

hashCode = hashCode * _fieldPrimeNumber + fields[i].GetHashCode();

Or something similar with xor like this :

hashCode = hashCode * _fieldPrimeNumber ^ fields[i].GetHashCode();
Roger answered 8/1, 2015 at 0:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.