implement GetHashCode() for objects that contain collections
Asked Answered
S

4

30

Consider the following objects:

class Route
{
   public int Origin { get; set; }
   public int Destination { get; set; }
}

Route implements equality operators.

class Routing
{
   public List<Route> Paths { get; set; }
}

I used the code below to implement GetHashCode method for the Routing object and it seems to work but I wonder if that's the right way to do it? I rely on equality checks and as I'm uncertain I thought I'll ask you guys. Can I just sum the hash codes or do I need to do more magic in order to guarantee the desired effect?

public override int GetHashCode() =>
{
    return (Paths != null 
                ? (Paths.Select(p => p.GetHashCode())
                        .Sum()) 
                : 0);
}

I checked several GetHashCode() questions here as well as MSDN and Eric Lippert's article on this topic but couldn't find what I'm looking for.

Stimulus answered 12/5, 2012 at 21:13 Comment(9)
That way you may have two different collections with same hash code.Melodie
Why not use the GetHashCode of the Collection itself?Zoophilous
@ValBakhtin There are only 2 ** 32 different hash codes, so not all collections can have their own.Diapophysis
@Zoophilous If you mean use GetHashCode on the List<Route>, it will not consider the contents of the list. It only guarantees to give the same hash code if the two List<> references are really pointing two one single list object (only one instance).Diapophysis
@JeppeStigNielsen Just to clarify, does the OP want to return the same HashCode to the same content, regardless if actually same object?Zoophilous
@Zoophilous I think the Original Poster wants that. If you just want "same instance equality" semantics, you would never override GetHashCode. The implementation from the base class System.Object does that already! But we will have to ask Joanna.Diapophysis
@JeppeStigNielsen You're probably right. Although you might want to uniquely identify an object only by it's collection member - there's a chance.Zoophilous
@JeppeStigNielsen - you're right - the default was to GetHashCode of the collection but it didn't work as these were two different objects.Stimulus
possible duplicate of Good GetHashCode() override for List of Foo objectsInwardness
D
18

I think your solution is fine. (Much later remark: LINQ's Sum method will act in checked context, so you can very easily get an OverflowException which means it is not so fine, after all.) But it is more usual to do XOR (addition without carry). So it could be something like

public override int GetHashCode()
{
  int hc = 0;
  if (Paths != null)
    foreach (var p in Paths)
      hc ^= p.GetHashCode();
  return hc;
}

Addendum (after answer was accepted):

Remember that if you ever use this type Routing in a Dictionary<Routing, Whatever>, a HashSet<Routing> or another situation where a hash table is used, then your instance will be lost if someone alters (mutates) the Routing after it has been added to the collection.

If you're sure that will never happen, use my code above. Dictionary<,> and so on will still work if you make sure no-one alters the Routing that is referenced.

Another choice is to just write

public override int GetHashCode()
{
  return 0;
}

if you believe the hash code will never be used. If every instace returns 0 for hash code, you will get very bad performance with hash tables, but your object will not be lost. A third option is to throw a NotSupportedException.

Diapophysis answered 12/5, 2012 at 21:24 Comment(10)
I'm not sure how returning a Hash Code easily changes within the life tie of the object makes any sense. It breaks the concept of hash codes.Sauer
@AndrewFinnell HashCode can very well be used to uniquely represent the inner state of an object, and not only to identify the single instance throughout it's lifetime.Zoophilous
@Zoophilous Of course it CAN be, but that doesn't mean it should be. If you put an instance of this object into a hashed collection, then add a Path, it won't find it's bucket. The hash code will have changed. This defeats the whole purpose of GetHashCode().Sauer
@AndrewFinnell I agree that there is a problem with that, but it is very COMMON to do it, although it might not be good when using a HashTable (which isn't common at all, by the way). Still, it is a good way to do it when you don't have the hashed collections issue. It is a very common solution for uid-ing inner state.Zoophilous
It's true that it can be "dangerous" to have GetHashCode() return a new value each time the object is changed (mutated). But it all comes down to what the Equals method is implemented like. From the context, I supposed that the Equals method ran through the List<>. If so, of course GetHashCode will have to do the same. It is essential that two instances that are equal according to Equals return the same hash code!Diapophysis
Of course the paths property cannot be changed now. But if the OP indended to do that why did he ask us how to hash Paths? See the comments under my answer for further discussion. This answer is correct.Eleph
@JeppeStigNielsen - Basically, I rather need to ensure that GetHashCode doesn't go in the way of Equals (as it did previously) than make it completely fool proof. I don't intend to use HashTables (btw: I suppose same goes for the HashSet?) but I'll definitely keep in mind the warnings about mutating objects in this case. I'll take your version - it seems to be more common in the GetHashCode world. Thanks!Stimulus
@Joanna Great. I felt I had to write something about mutating an object's hash code, now when my answer is "accepted", so I added some more information, as you see.Diapophysis
Also consider this variation of GetHashCode implementation (by @JonSkeet) based on multiple fields: #263900Structuralism
If you want a one-liner: return Paths?.Aggregate(0, (hashCode, p) => hashCode ^= p.GetHashCode()) ?? 0;Darrin
E
11

The code from Jeppe Stig Nielsen's answer works but it could lead to a lot of repeating hash code values. Let's say you are hashing a list of ints in the range of 0-100, then your hash code would be guarnteed to be between 0 and 255. This makes for a lot of collisions when used in a Dictionary. Here is an improved version:

public override int GetHashCode()
{
  int hc = 0;
  if (Paths != null)
    foreach (var p in Paths) {
        hc ^= p.GetHashCode();
        hc = (hc << 7) | (hc >> (32 - 7)); //rotale hc to the left to swipe over all bits
    }
  return hc;
}

This code will at least involve all bits over time as more and more items are hashed in.

Eleph answered 12/5, 2012 at 21:30 Comment(6)
I don't agree. If the GetHashCode() on the class Route is well written (which we will have to assume), then "my" GetHashCode() implementation on the Routing class will be great.Diapophysis
This seems like an invalid GetHashCode. If I put an object in a hashed collection, then add an object to Paths there is a high chance the Contains will never be able to find it as the bucket will have changed. The hash should be constant for the life of the object.Sauer
Huh? GetHashCode should cover all state you want to use to find this object in a hash table. If the op did not want to search on the path, why did he ask?? Additional point: Of course you can't change the object once it is inserted, but that is always true (even for a gethashcode like return 0, because equality cannot change either once the object is inserted). Next point: Your point holds for all answers in this question which is unlikely. I think you misunderstood the question.Eleph
@Eleph Another user brought up a great point. We don't know how Equals() is implemented. If Equals goes through the Paths to check for equality, then yes the HashCode should consist of whats in the Paths. If Equality checks something else like an Unique ID for a Paths object, then no it shouldn't. He says he relies on Equality checks but doesn't show what that check is.Sauer
@AndrewFinnell - yes, Equals checks all the Paths in the collection.Stimulus
@Joanna, this answer is correct then. You might want to research the rules for GetHashCode and Equals, though, to make 100% sure your code is correct.Eleph
A
10

As a guideline, the hash of an object must be the same over the object's entire lifetime. I would leave the GetHashCode function alone, and not overwrite it. The hash code is only used if you want to put your objects in a hash table.

You should read Eric Lippert's great article about hash codes in .NET: Guidelines and rules for GetHashCode.

Quoted from that article:

Guideline: the integer returned by GetHashCode should never change

Rule: the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable

If an object's hash code can mutate while it is in the hash table then clearly the Contains method stops working. You put the object in bucket #5, you mutate it, and when you ask the set whether it contains the mutated object, it looks in bucket #74 and doesn't find it.

The GetHashCode function you implemented will not return the same hash code over the lifetime of the object. If you use this function, you will run into trouble if you add those objects to a hash table: the Contains method will not work.

Araucaria answered 12/5, 2012 at 21:25 Comment(0)
M
0

I don't think it's a right way to do, cause to dtermine the final hashcode it has to be unique for specifyed object. In your case you do a Sum(), which can produce the same result with different hashcodes in collection (at the end hashcodes are just integers).

If your intention is to determine equality based on the content of the collection, at this point just compare these cillections between two objects. It could be time consuming operation, by the way.

Memorabilia answered 12/5, 2012 at 21:22 Comment(3)
As I said elsewhere, it's OK for different objects to share the same hash code. After all there are only 2 ** 32 possible values of an int. But a good hash code should hit each of the possible values with equal frequency, in a way.Diapophysis
I don't say they should't have same hashcode (if not, they will never be eual at the end), but the problem in the code provided is that it can return false positive results.Memorabilia
@Memorabilia - I'm going to change the logic to do xor rather than this - however, I wanted to check with you re: performance. When would this operation become time consuming, and do we talk miliseconds or something bigger ? At the moment the collection of paths is not expected to be in any way big - just a few items, do I need to worry about it ?Stimulus

© 2022 - 2024 — McMap. All rights reserved.