Null vs Empty Collections in GetHashCode
Asked Answered
I

1

1

In the implementation of GetHashCode below, when Collection is null or empty will both result in a hash code of 0.

A colleague suggested return a random hard coded number like 19 to differentiate from a null collection. Why would I want to do this? Why would I care that a null or empty collection produces a different hash code?

public class Foo
{
    public List<int> Collection { get; set; }
    // Other properties omitted.

    public int override GetHashCode()
    {
        var hashCode = 0;
        if (this.Collection != null)
        {
            foreach (var item in this.Collection)
            {
                var itemHashCode = item == null ? 0 : item.GetHashCode();
                hashCode = ((hashCode << 5) + hashCode) ^ itemHashCode;
            }
        }

        return hashCode;
    }
}
Interne answered 12/6, 2019 at 7:50 Comment(13)
"should I return a random hard coded number like 123 to differentiate from a null collection" How should this ever be correctly answered? It´s completely up to your scenario. In fact there is no single correct hashcode-implementation. that fits them all. Personally I won´t bother too much for one collision more, as long as not all possible values return the exact same hashcode.Sibylle
Why override GetHashCode in tie first place? What do you want to achieve?Utmost
If you assign a different meaning to an empty collection vs. a null-reference and no collection, then yes, those two things should produce different hash codes. If you don't, then you can just use the same, usually 0 is fine.Catabolite
Jon Skeet has a good hashcode implementation for lists, but it really depends what you are trying to do here. Are you testing for list equality or list sequence equality?Cause
My two cents is that you should never have a null-reference to a collection anyway, because usually you don't treat that differently from an empty one.Catabolite
However, nobody here can answer this question for you, the only person that can answer this question is you, the rest you will get is opinions.Catabolite
Reworded the question to be a bit clearer. I'm less worried about the implementation, I care more about the why. Why should I care to differentiate null from an empty collection in a hash code?Interne
I doubt you should. However it may make sense if a null-collection is semantically different from an empty one in your scenario. If both share the same semnatics there surely is no need to handle them differently, is it? In doubt I won´t bother for that, in fact you could even return the exact same hashcode for every possible value. However this may lead to performance-issues when using dictionaries or hashsets.Sibylle
19 is not better than 0. There is more substantial trouble, this is not the kind of object that is ever a good candidate for a Dictionary key or HashSet element. The kind of collection objects where GetHashCode() actually get used. The correct implementation is throw new NotImplementedException();Beaubeauchamp
Btw.: you probably shouldn´t introduce any collection into a hashcode at all, as it will very likely change at some point. And members within hashcodes should never change, at least not while the instances are stored in a ditctionary.Sibylle
@HansPassant If the collection is large it makes for a bad dictionary key. Collections of items with single-digit numbers of items in them can make for fine dictionary keys.Triceratops
It is not the only problem, bigger issue is that it should not be so easily mutable. Accidentally call the list object setter or modify a list item and you can't find the object back anymore. Very hard to debug. The exception keeps everybody honest and outcomes predictable.Beaubeauchamp
@HansPassant If it's convenient to change, yes, it's better if it's immutable, but it's not exactly difficult to not mutate an object while it is currently a dictionary key. It's perhaps a problem if you're exposing it in a library potentially used by people not familiar with how hash based collections work, it's likely to be fine when used in a particular application in which all programmers working on that application know how hash based collections work and that they shouldn't mutate a key. Just having the program not work at all (in a clear way) isn't always an option.Triceratops
T
2

The design of GetHashCode is that it is supposed to minimize the number of collisions that will take place, as best as it can. While having some hash collisions is inevitable, you'll want to be mindful of what types of objects are colliding, what type of data are going to be stored in your hash based collections, and working to ensure that types of objects stored together in the same collection are less likely to collide.

So if you happen to know something about how hash-based collections of this type are going to be used, and that there are likely to be both null and empty objects in them, then it would improve the performance to have them not collide. If you suspect that having both a null and empty value in the same collection is not particularly likely, then having them collide isn't actually a concern.

Triceratops answered 12/6, 2019 at 15:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.