Why is it important to override GetHashCode when Equals method is overridden?
Asked Answered
K

16

1660

Given the following class

public class Foo
{
    public int FooId { get; set; }
    public string FooName { get; set; }

    public override bool Equals(object obj)
    {
        Foo fooItem = obj as Foo;

        if (fooItem == null) 
        {
           return false;
        }

        return fooItem.FooId == this.FooId;
    }

    public override int GetHashCode()
    {
        // Which is preferred?

        return base.GetHashCode();

        //return this.FooId.GetHashCode();
    }
}

I have overridden the Equals method because Foo represent a row for the Foos table. Which is the preferred method for overriding the GetHashCode?

Why is it important to override GetHashCode?

Kosaka answered 16/12, 2008 at 13:41 Comment(3)
It s important to implement both equals and gethashcode, due to collisions, in particular while using dictionaries. if two object returns same hashcode, they are inserted in the dictionary with chaining. While accessing the item equals method is used.Chromonema
Read this one loganfranken.com/blog/692/overriding-equals-in-c-part-2Sherellsherer
Using visual studio we can generate Equals() and GetHashCode() based on our class props. see this link. learn.microsoft.com/en-us/visualstudio/ide/reference/…Otalgia
L
1487

Yes, it is important if your item will be used as a key in a dictionary, or HashSet<T>, etc - since this is used (in the absence of a custom IEqualityComparer<T>) to group items into buckets. If the hash-code for two items does not match, they may never be considered equal (Equals will simply never be called).

The GetHashCode() method should reflect the Equals logic; the rules are:

  • if two things are equal (Equals(...) == true) then they must return the same value for GetHashCode()
  • if the GetHashCode() is equal, it is not necessary for them to be the same; this is a collision, and Equals will be called to see if it is a real equality or not.

In this case, it looks like "return FooId;" is a suitable GetHashCode() implementation. If you are testing multiple properties, it is common to combine them using code like below, to reduce diagonal collisions (i.e. so that new Foo(3,5) has a different hash-code to new Foo(5,3)):

In modern frameworks, the HashCode type has methods to help you create a hashcode from multiple values; on older frameworks, you'd need to go without, so something like:

unchecked // only needed if you're compiling with arithmetic checks enabled
{ // (the default compiler behaviour is *disabled*, so most folks won't need this)
    int hash = 13;
    hash = (hash * 7) + field1.GetHashCode();
    hash = (hash * 7) + field2.GetHashCode();
    ...
    return hash;
}

Oh - for convenience, you might also consider providing == and != operators when overriding Equals and GetHashCode.


A demonstration of what happens when you get this wrong is here.

Leman answered 16/12, 2008 at 13:47 Comment(25)
Can I ask ahy are you multiplying with such factors?Wifehood
Actually, I could probably lose one of them; the point is to try to minimise the number of collisions - so that an object {1,0,0} has a different hash to {0,1,0} and {0,0,1} (if you see what I mean),Leman
I tweaked the numbers to make it clearer (and added a seed). Some code uses different numbers - for example the C# compiler (for anonymous types) uses a seed of 0x51ed270b and a factor of -1521134295.Leman
@Leandro López: Usually the factors are chosen to be prime numbers because it makes the number of collisions smaller.Demonism
"Oh - for convenience, you might also consider providing == and != operators when overriding Equals and GethashCode.": Microsoft discourages implementing operator== for objects that are not immutable - msdn.microsoft.com/en-us/library/ms173147.aspx - " It is not a good idea to override operator == in non-immutable types."Scoria
@Scoria by that token, it is also not a good idea to mess with GetHashCode for immutable types, which by implication means not to mess with Equals! So: if it isn't value-esque (regardless of class vs class), don't mess with equality - a reasonable rule of thumb.Leman
So does it make sense to intentionally make GetHashCode less precise than equals, to speed up dictionary lookups? So far i've always mirrored the logic I use in .Equals and GetHashCode. EDIT: Answer to my question: #6305824Littoral
Microsoft says - The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposesErosive
@RGI more generally, GetHashCode() is never guaranteed between AppDomain, so that is not a significant statement; the rest is just a re-statement of the second bullet in the answer. The default implementation is find for hashing, but cannot be used as a unique identifier - everything we already said.Leman
Don't forget to check whether field1, field2, etc... are null if they can possible be null.Bose
Minor note - This particular implementation of computing the hash is vulnerable to generating an integer overflow exception if the /checked project compiler setting is enabled. Consider wrapping computations with unchecked{}Isabea
@Neil fair observation; Unchecked is the compiler default, but you are right: it shouldn't be assumedLeman
Shouldn't we check whether field1 and field2 are null while calculating hash, if it is nullable?Chthonian
@Lijo yes, you shouldLeman
@Richard no, it should notLeman
Can somebody comment on the following GetHashCode() rules outlined in Effective c# 4.0 by Bill Wagner? Rule 1) suggests that objA.Equals(objB) == true does not imply that both objects must return same hashcode. This is simply not true as some LINQ methods (ie. Distinct()) start to misbehave. I was very surprised to find such error in a handbook of such caliber: "If two objects are equal (as defined by operator==), they must generate the same hash value. Otherwise, hash codes can’t be used to find objects in containers. "Cryptography
@Cryptography that is backwards; two things having the same hash-code does not imply that they are equal (Equals == true), however, two things that report as equal (Equals == true) must return the same hashcode, otherwise they've broken the API. Are you sure you aren't simply misreading the paragraph? To quote from MSDN: "If your overridden Equals method returns true when two objects are tested for equality, your overridden GetHashCode method must return the same value for the two objects."Leman
Fun fact: Microsoft got it wrong too.Rosemaryrosemond
After reading all of these different answers, I don't think there is a valid reason to actually override these methods. It could get very complicated with nested objects/cycles. Instead I have opted to implement an extension method to check equality in my way for my specific case.Kansu
In addition to providing an informative answer, the link in "A demonstration of what happens when you get this wrong is here." is very helpful and provides additional clarity. Thank you!Pesthole
May I ask why you're not using the built-in class System.HashCode for this calculation? I mean, something like: var hc = new System.HashCode();- hc.Add(field1.GetHashCode()); hc.Add(field2.GetHashCode()); return hc.ToHashCode();Catfish
@Catfish sure! simple answer: Dec 16 '08Leman
@MarcGravell - You mean, System.HashCode did not exist when you wrote this answer? I see, thanks for the hint. :-) Maybe you can mention it in your answer (a one-liner "Update: ..." would be great)Catfish
I know this is quite old, but i have a doubt. Why you multiply the hash by 7? I suppose 7 is a random number you have chosen, but then why multipling the hash by some random number? Thx in advance to you all! Nice answer!Hayden
thanks, this helped a lot with list's SequenceEqual, had to implement my own equality comparerSurbeck
D
156

It's actually very hard to implement GetHashCode() correctly because, in addition to the rules Marc already mentioned, the hash code should not change during the lifetime of an object. Therefore the fields which are used to calculate the hash code must be immutable.

I finally found a solution to this problem when I was working with NHibernate. My approach is to calculate the hash code from the ID of the object. The ID can only be set though the constructor so if you want to change the ID, which is very unlikely, you have to create a new object which has a new ID and therefore a new hash code. This approach works best with GUIDs because you can provide a parameterless constructor which randomly generates an ID.

Dg answered 21/12, 2008 at 12:39 Comment(11)
i dont think it s very hard to implement has code. given these rules and better explained in effective c# book, i think overrident GetHashCode is rather easy.Chromonema
Can you elaborate on "the hash code should not change during the lifetime of an object"? Is this NHibernate specific?Howrah
@Howrah I believe it has to do with: if you add the object to a dictionary and then change the object's id, when fetching later you will be using a different hash to retrieve it so you will never get it from the dictionary.Touristy
@DarthVader, it is not hard to override GetHashCode(). And, it's not hard to override Equals() using reference equivalence (or Albic's GUID proposition above). It is hard to override Equals() using value equivalence while still abiding by the rule that two objects that are Equals()-equivalent must return the same hash code: only object state that is immutable should be used in your hash code algorithm.Mclendon
Microsoft's documentation of the GetHashCode() function neither states nor implies that the object hash must remain consistent over it's lifetime. In fact, it specifically explains one permissible case in which it might not: "The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method."Satterwhite
@PeterAllenWebb: The design of .net incorporated many aspects from Java, including some mistakes. Because Java collections were not designed to allow users to specify alternate comparison methods and hash functions, the only way for collections to regard two objects as matching each other was for them to report themselves as "equal", whether or not they were actually equivalent. This led to different types defining "equals" to mean vastly different things, which makes it very difficult for objects which encapsulate or aggregate other objects of generic types to know whether they're "equal".Monocle
"the hash code should not change during the lifetime of an object" - that is not true.Mythicize
A better way to say it is "the hash code (nor the evaulation of equals) should change during the period the object is used as a key for a collection" So if you add the object to a dictionary as a key you must ensure that GetHashCode and Equals will not change their output for a given input until you remove the object from the dictionary.Nicotiana
@ScottChamberlain I think you forgot NOT in your comment, it should be: "the hash code (nor the evaulation of equals) should NOT change during the period the object is used as a key for a collection". Right?Ita
Except this has a flaw: if ON UPDATE CASCADE is specified, I can change the primary key, which means the primary key is not immutable.Baghdad
@StanProkop I think ScottChamberlain's comment was correct, however to avoid confusion, the sentence should read "the hash code (and the evaulation of equals), should NOT change during the period the object is used as a key for a collection"Aquitaine
D
73

By overriding Equals you're basically stating that you know better how to compare two instances of a given type.

Below you can see an example of how ReSharper writes a GetHashCode() function for you. Note that this snippet is meant to be tweaked by the programmer:

public override int GetHashCode()
{
    unchecked
    {
        var result = 0;
        result = (result * 397) ^ m_someVar1;
        result = (result * 397) ^ m_someVar2;
        result = (result * 397) ^ m_someVar3;
        result = (result * 397) ^ m_someVar4;
        return result;
    }
}

As you can see it just tries to guess a good hash code based on all the fields in the class, but if you know your object's domain or value ranges you could still provide a better one.

Diuretic answered 16/12, 2008 at 13:48 Comment(4)
Won't this always return zero? Probably should initialise result to 1! Also needs a few more semi-colons.Eshelman
You are aware of what the XOR operator (^) does?Burial
@SamMackrill I've added in the missing semi-colons.Tedmann
@SamMackrill No, it won't always return 0. 0 ^ a = a, so 0 ^ m_someVar1 = m_someVar1. He might as well set the initial value of result to m_someVar1.Syl
S
51

As of .NET 4.7 the preferred method of overriding GetHashCode() is shown below. If targeting older .NET versions, include the System.ValueTuple nuget package.

// C# 7.0+
public override int GetHashCode() => (FooId, FooName).GetHashCode();

In terms of performance, this method will outperform most composite hash code implementations. The ValueTuple is a struct so there won't be any garbage, and the underlying algorithm is as fast as it gets.

Stopped answered 11/5, 2020 at 12:52 Comment(1)
Alternatively, HashCode.Combine(FooId, FooName) can be used. ValueTuple<...>.GetHashCode() uses HashCode.Combine under the hood.Pictograph
D
44

How about:

public override int GetHashCode()
{
    return string.Format("{0}_{1}_{2}", prop1, prop2, prop3).GetHashCode();
}

Assuming performance is not an issue :)

Despinadespise answered 25/11, 2010 at 0:48 Comment(7)
erm - but you're returning a string for an int based method ;_0Obannon
No, he does call GetHashCode() from the String object, which returns an int.Sensate
I dont expect this to be as fast as I would like to be, not just for the boxing involved for value types, but also for the performance of string.Format. Another geeky one I have seen is new { prop1, prop2, prop3 }.GetHashCode(). Cant comment though which one would be slower between these two. Do not abuse tools.Apparitor
This will return true for { prop1="_X", prop2="Y", prop3="Z" } and { prop1="", prop2="X_Y", prop3="Z_" }. You probably don't want that.Trailer
Yep, you can always replace the underscore symbol with something not so common (e.g. •, ▲, ►, ◄, ☺, ☻) and hope your users won't use these symbols... :)Despinadespise
@Trailer You have a typo: "_X_Y_Z" != "_X_Y_Z_" (extra trailing underscore for second prop3), but point taken :)Layla
@LudmilTinkov safer change would be to call prop1/2/3.GetHashCode(), and then underscores wouldn't make a differenceLayla
P
44

Please don´t forget to check the obj parameter against null when overriding Equals(). And also compare the type.

public override bool Equals(object obj)
{
    Foo fooItem = obj as Foo;

    if (fooItem == null)
    {
       return false;
    }

    return fooItem.FooId == this.FooId;
}

The reason for this is: Equals must return false on comparison to null. See also http://msdn.microsoft.com/en-us/library/bsc2ak47.aspx

Pyrogen answered 17/11, 2011 at 7:46 Comment(6)
This check for type will fail in the situation where a subclass refers to the superclass Equals method as part of it's own comparison (i.e. base.Equals(obj)) - should use as insteadInextricable
@sweetfa: It depends on how the Equals method of the subclass is implemented. It could also call base.Equals((BaseType)obj)) which would be working fine.Pyrogen
No it won't: msdn.microsoft.com/en-us/library/system.object.gettype.aspx. And besides, the implementation of a method should not fail or succeed depending on the way it is called. If the runtime-type of an object is a subclass of some baseclass then the Equals() of the baseclass should return true if obj indeed is equal to this no matter how Equals() of the baseclass was called.Refractometer
Moving the fooItem to the top and then checking it for null will perform better in the case of null or a wrong type.Twoedged
@IllidanS4 unless you're working with a value type. But then why would you be overriding a value type's Equals.Divisionism
@40Alpha Well, yeah, then obj as Foo would be invalid.Twoedged
T
21

Just to add on above answers:

If you don't override Equals then the default behavior is that references of the objects are compared. The same applies to hashcode - the default implmentation is typically based on a memory address of the reference. Because you did override Equals it means the correct behavior is to compare whatever you implemented on Equals and not the references, so you should do the same for the hashcode.

Clients of your class will expect the hashcode to have similar logic to the equals method, for example linq methods which use a IEqualityComparer first compare the hashcodes and only if they're equal they'll compare the Equals() method which might be more expensive to run, if we didn't implement hashcode, equal object will probably have different hashcodes (because they have different memory address) and will be determined wrongly as not equal (Equals() won't even hit).

In addition, except the problem that you might not be able to find your object if you used it in a dictionary (because it was inserted by one hashcode and when you look for it the default hashcode will probably be different and again the Equals() won't even be called, like Marc Gravell explains in his answer, you also introduce a violation of the dictionary or hashset concept which should not allow identical keys - you already declared that those objects are essentially the same when you overrode Equals so you don't want both of them as different keys on a data structure which suppose to have a unique key. But because they have a different hashcode the "same" key will be inserted as different one.

Tenon answered 12/11, 2014 at 13:48 Comment(0)
M
19

It is because the framework requires that two objects that are the same must have the same hashcode. If you override the equals method to do a special comparison of two objects and the two objects are considered the same by the method, then the hash code of the two objects must also be the same. (Dictionaries and Hashtables rely on this principle).

Mudcat answered 16/12, 2008 at 13:48 Comment(1)
OK, that makes sense now. So if the hashcodes of two distinct instances are different but they should be treated as logically equal, you can't use them reliably in a Dictionary or Hashtable. That's an abstract requirement, but now I get it.Goggles
F
15

We have two problems to cope with.

  1. You cannot provide a sensible GetHashCode() if any field in the object can be changed. Also often a object will NEVER be used in a collection that depends on GetHashCode(). So the cost of implementing GetHashCode() is often not worth it, or it is not possible.

  2. If someone puts your object in a collection that calls GetHashCode() and you have overrided Equals() without also making GetHashCode() behave in a correct way, that person may spend days tracking down the problem.

Therefore by default I do.

public class Foo
{
    public int FooId { get; set; }
    public string FooName { get; set; }

    public override bool Equals(object obj)
    {
        Foo fooItem = obj as Foo;

        if (fooItem == null)
        {
           return false;
        }

        return fooItem.FooId == this.FooId;
    }

    public override int GetHashCode()
    {
        // Some comment to explain if there is a real problem with providing GetHashCode() 
        // or if I just don't see a need for it for the given class
        throw new Exception("Sorry I don't know what GetHashCode should do for this class");
    }
}
Fumarole answered 19/11, 2013 at 10:17 Comment(12)
Throwing an exception from GetHashCode is a violation of the Object contract. There is no difficulty defining a GetHashCode function such that any two objects which are equal return the same hash code; return 24601; and return 8675309; would both be valid implementations of GetHashCode. Performance of Dictionary will only be decent when the number of items is small, and will get very bad if the number of items gets large, but it will work correctly in any case.Monocle
@supercat, It is not possible to implement GetHashCode in a sensible way if the identify fields in the object can change, as the hash code must never change. Doing what you say could lead someone having to spend many days tracking down the performance problem, then many weeks on a large system redesigning to remove the use of the dictionaries.Fumarole
I used to do something like this for all classes I defined that needed Equals(), and where I was completely sure I'd never use that object as a key in a collection. Then one day a program where I'd used an object like that as input to a DevExpress XtraGrid control crashed. It turns out XtraGrid, behind my back, was creating a HashTable or something based on my objects. I got into a minor argument with the DevExpress support people about this. I said it was not smart that they based their component's functionality and reliability on an unknown customer implementation of an obscure method.Multiplex
The DevExpress people were rather snarky, basically saying I must be an idiot to throw an exception in a GetHashCode() method. I still think they should find an alternative method of doing what they're doiing - I recall Marc Gravell on a different thread describing how he builds a dictionary of arbitrary objects without being dependent on GetHashCode() - can't recall how he did it though.Multiplex
@RenniePet, must better having a crush due to throwing an exception, then having a very hard to find bug due to an invalid implementation.Fumarole
Yeah, that was my attitude too until the DevExpress XtraGrid problem. Now I just deliberately neglect to write GetHashCode() methods, and suppress the compile-time warnings from Code Analysis and ReSharper.Multiplex
Lesson here: Don't use DevExpress, ever, for anything.Arsenopyrite
@IanRingrose: It is common for immutable classes to construct mutable class objects store information in them, and then refrain from ever modifying them or making references to those objects available to code that might modify them. The fact that a class is mutable does not imply that all instances of that class will be subject to modification while stored as a key in a dictionary.Monocle
could you compute the value once the first time and then not change it later? there's no reason that the hashcode must be dependent on the fields, that's just a nice way to evenly distribute hashcodes.Antoniaantonie
@DaveCousineau there is the requirement that two equal objects must return the same hashcode and equal is only overridden when the result depends on values of fields.Fumarole
@IanRingrose right k. I guess what you say makes sense, throwing in GetHashCode for a mutable object, since it means that the object just should not be used that way. equality in C# is sure messy.Antoniaantonie
@DaveCousineau GetHashCode and IsEqual should never have been members of Object as they are not senible for many classes.Fumarole
A
11

Hash code is used for hash-based collections like Dictionary, Hashtable, HashSet etc. The purpose of this code is to very quickly pre-sort specific object by putting it into specific group (bucket). This pre-sorting helps tremendously in finding this object when you need to retrieve it back from hash-collection because code has to search for your object in just one bucket instead of in all objects it contains. The better distribution of hash codes (better uniqueness) the faster retrieval. In ideal situation where each object has a unique hash code, finding it is an O(1) operation. In most cases it approaches O(1).

Armipotent answered 21/2, 2012 at 11:36 Comment(0)
C
8

It's not necessarily important; it depends on the size of your collections and your performance requirements and whether your class will be used in a library where you may not know the performance requirements. I frequently know my collection sizes are not very large and my time is more valuable than a few microseconds of performance gained by creating a perfect hash code; so (to get rid of the annoying warning by the compiler) I simply use:

   public override int GetHashCode()
   {
      return base.GetHashCode();
   }

(Of course I could use a #pragma to turn off the warning as well but I prefer this way.)

When you are in the position that you do need the performance than all of the issues mentioned by others here apply, of course. Most important - otherwise you will get wrong results when retrieving items from a hash set or dictionary: the hash code must not vary with the life time of an object (more accurately, during the time whenever the hash code is needed, such as while being a key in a dictionary): for example, the following is wrong as Value is public and so can be changed externally to the class during the life time of the instance, so you must not use it as the basis for the hash code:


   class A
   {
      public int Value;

      public override int GetHashCode()
      {
         return Value.GetHashCode(); //WRONG! Value is not constant during the instance's life time
      }
   }    

On the other hand, if Value can't be changed it's ok to use:


   class A
   {
      public readonly int Value;

      public override int GetHashCode()
      {
         return Value.GetHashCode(); //OK  Value is read-only and can't be changed during the instance's life time
      }
   }

Continue answered 26/6, 2011 at 23:21 Comment(7)
Downvoted. This is plain wrong. Even Microsoft states in MSDN (msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx) that the value of GetHashCode MUST change when the object state changes in a way that may affect the return value of a call to Equals(), and even in its examples it also shows GetHashCode implementations that fully depend on publicly changeable values.Anabantid
Sebastian, I disagree: If you add an object to a collection that uses hash codes it will be put in a bin dependent on the hash code. If you now change the hash code you will not find the object again in the collection as the wrong bin will be searched. This is, in fact, something that has happened in our code and that's why I found it necessary to point that out.Continue
Sebastian, In addition, I cannot see a statement in the link (msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx) that GetHashCode() must change. In contrary - it must NOT change as long as Equals returns the same value for the same argument: "The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. " This statement does not imply the opposite, that it must change if the return value for Equals changes.Continue
@ILoveFortran, I don't think what you're saying is correct. The MSDN article to clearly states: "A hash code is not a permanent value. For this reason: 1)Do not serialize hash code values or store them in databases. 2) Do not use the hash code as the key to retrieve an object from a keyed collection. 3... 4...", ie, the HasCode changes during the lifetime of the object and your code needs to be aware of that. If you want to make each object unique with an identifier that never changes in its lifetime, you should use something else.Mendicity
@Joao, you are confusing the client/consumer side of the contract with the producer/implementer. I am talking about the responsibility of the implementer, who overrides GetHashCode(). You are talking about the consumer, the one who is using the value.Continue
Complete misunderstanding... :) The truth is that the hash code must change when the state of the object changes unless the state is irrelevant to the identity of the object. Also, you should never use a MUTABLE object as a key in your collections. Use read-only objects for this purpose. GetHashCode, Equals... and some other methods whose names I don't remember at this very moment should NEVER throw.Kermie
@darlove: One should never use an object which is going to be mutated while it is in a hashed collection. That does not imply that there is anything wrong with using a mutable object in a hashed collection if all references to the object, the collection are held by entities that will refrain from using any of them to modify the object while it is in the collection.Monocle
L
6

As of C# 9(.net 5 or .net core 3.1), you may want to use records as it does Value Based Equality by default.

Lyric answered 16/12, 2008 at 13:41 Comment(0)
M
5

You should always guarantee that if two objects are equal, as defined by Equals(), they should return the same hash code. As some of the other comments state, in theory this is not mandatory if the object will never be used in a hash based container like HashSet or Dictionary. I would advice you to always follow this rule though. The reason is simply because it is way too easy for someone to change a collection from one type to another with the good intention of actually improving the performance or just conveying the code semantics in a better way.

For example, suppose we keep some objects in a List. Sometime later someone actually realizes that a HashSet is a much better alternative because of the better search characteristics for example. This is when we can get into trouble. List would internally use the default equality comparer for the type which means Equals in your case while HashSet makes use of GetHashCode(). If the two behave differently, so will your program. And bear in mind that such issues are not the easiest to troubleshoot.

I've summarized this behavior with some other GetHashCode() pitfalls in a blog post where you can find further examples and explanations.

Mould answered 12/2, 2020 at 20:0 Comment(0)
S
1

It's my understanding that the original GetHashCode() returns the memory address of the object, so it's essential to override it if you wish to compare two different objects.

EDITED: That was incorrect, the original GetHashCode() method cannot assure the equality of 2 values. Though objects that are equal return the same hash code.

Sydel answered 7/10, 2013 at 17:6 Comment(0)
C
1

In .NET, when you override the Equals() method, it's recommended to also override GetHashCode(). The reason is related to how .NET uses GetHashCode in its built-in data structures.

When you store an object in a hash-based collection like Dictionary or HashSet, .NET uses the value returned by GetHashCode() to organize its data. Objects that are considered equal should return the same hash code, providing optimal performance when retrieving objects from such a collection.

If you override Equals(), you're changing the definition of what makes two objects equal. So, if you don't also override GetHashCode(), objects that you consider "equal" may return different hash codes. This can lead to inconsistent behavior when objects are used in a hash-based collection. They might not be found in the collection, even though you know they're there, because the collection is looking in the wrong hash bucket.

Let's see an example. Suppose, you have a Person class and you have overridden Equals() to say that two Person objects are equal if their Name property matches. But you forgot to override GetHashCode(). Now, if you add a Person object with Name="John" to a HashSet, and later try to check if the Person object with Name="John" exists in the HashSet, it might return false, which is incorrect, because the GetHashCode() might be returning the hash code of the object reference, not the Name string which you're using for equality comparison.

To avoid this issue, anytime you override Equals(), you should also override GetHashCode() to ensure it uses the same properties that Equals() does. This will help maintain consistency when using hash-based collections.


Overriding GetHashCode() requires producing a hash code that considers the same properties used in Equals(), and is also evenly distributed to prevent hash collisions.

Here is one example of how you might achieve this:

public override int GetHashCode()
{
   int hash = 17;

   // Suitable nullity checks etc, of course :)
   hash = (hash * 23) + field1.GetHashCode();
   hash = (hash * 23) + field2.GetHashCode();
   return hash;
}

In this example, field1 and field2 are the fields that the Equals() method checks. The constants 17 and 23 are just arbitrarily chosen 'magic' numbers that often give good results.

You can also use HashCode.Combine() in C# 8.0 and later:

public override int GetHashCode()
{
    return HashCode.Combine(field1, field2);
}

Remember, the goal of GetHashCode() is not to avoid collisions entirely, but to distribute them evenly. Collisions are inevitable because the number of possible hash codes (2^32 for int) is smaller than the number of possible string values, for example. But a good hash function will help ensure a more even distribution of hash code values and reduce the probability of collision, resulting in better performance when using hash-based collections.

Corr answered 29/12, 2023 at 2:11 Comment(0)
F
-8

Below using reflection seems to me a better option considering public properties as with this you don't have have to worry about addition / removal of properties (although not so common scenario). This I found to be performing better also.(Compared time using Diagonistics stop watch).

    public int getHashCode()
    {
        PropertyInfo[] theProperties = this.GetType().GetProperties();
        int hash = 31;
        foreach (PropertyInfo info in theProperties)
        {
            if (info != null)
            {
                var value = info.GetValue(this,null);
                if(value != null)
                unchecked
                {
                    hash = 29 * hash ^ value.GetHashCode();
                }
            }
        }
        return hash;  
    }
Fiddlestick answered 14/3, 2014 at 18:8 Comment(1)
The implementation of GetHashCode() is expected to be very lightweight. I'm not sure is using reflection is noticeable with StopWatch on thousands of calls, but it surely is on millions (think of populating a dictionary out of a list).Cosmorama

© 2022 - 2025 — McMap. All rights reserved.