Implementation of Dictionary where equivalent contents are equal and return the same hash code regardless of order of insertion
Asked Answered
S

1

6

I need to use Dictionary<long, string> collections that given two instances d1 and d2 where they each have the same KeyValuePair<long, string> contents, which could be inserted in any order:

  1. (d1 == d2) evaluates to true
  2. d1.GetHashCode() == d2.GetHashCode()

The first requirement was achieved most easily by using a SortedDictionary instead of a regular Dictionary.

The second requirement is necessary because I have one point where I need to store Dictionary<Dictionary<long, string>, List<string> - the main Dictionary type is used as the key for another Dictionary, and if the HashCodes don't evaluate based on identical contents, the using ContainsKey() will not work the way that I want (ie: if there is already an item inserted into the dictionary with d1 as its key, then dictionary.ContainsKey(d2) should evaluate to true.

To achieve this, I have created a new object class ComparableDictionary : SortedDictionary<long, string>, and have included the following:

public override int GetHashCode() {            
   StringBuilder str = new StringBuilder();
   foreach (var item in this) {
      str.Append(item.Key);
      str.Append("_");
      str.Append(item.Value);
      str.Append("%%");
   }
   return str.ToString().GetHashCode();
 }

In my unit testing, this meets the criteria for both equality and hashcodes. However, in reading Guidelines and Rules for GetHashCode, I came across the following:

Rule: the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable

It is permissible, though dangerous, to make an object whose hash code value can mutate as the fields of the object mutate. If you have such an object and you put it in a hash table then the code which mutates the object and the code which maintains the hash table are required to have some agreed-upon protocol that ensures that the object is not mutated while it is in the hash table. What that protocol looks like is up to you.

If an object's hash code can mutate while it is in the hash table then clearly the Contains method stops working. You put the object in bucket #5, you mutate it, and when you ask the set whether it contains the mutated object, it looks in bucket #74 and doesn't find it.

Remember, objects can be put into hash tables in ways that you didn't expect. A lot of the LINQ sequence operators use hash tables internally. Don't go dangerously mutating objects while enumerating a LINQ query that returns them!

Now, the Dictionary<ComparableDictionary, List<String>> is used only once in code, in a place where the contents of all ComparableDictionary collections should be set. Thus, according to these guidelines, I think that it would be acceptable to override GetHashCode as I have done (basing it completely on the contents of the dictionary).

After that introduction my questions are:

  1. I know that the performance of SortedDictionary is very poor compared to Dictionary (and I can have hundreds of object instantiations). The only reason for using SortedDictionary is so that I can have the equality comparison work based on the contents of the dictionary, regardless of order of insertion. Is there a better way to achieve this equality requirement without having to use a SortedDictionary?
  2. Is my implementation of GetHashCode acceptable based on the requirements? Even though it is based on mutable contents, I don't think that that should pose any risk, since the only place where it is using (I think) is after the contents have been set.

Note: while I have been setting these up using Dictionary or SortedDictionary, I am not wedded to these collection types. The main need is a collection that can store pairs of values, and meet the equality and hashing requirements defined out above.

Secunda answered 29/5, 2011 at 14:28 Comment(0)
O
6

Your GetHashCode implementation looks acceptable to me, but it's not how I'd do it.

This is what I'd do:

  • Use composition rather than inheritance. Aside from anything else, inheritance gets odd in terms of equality
  • Use a Dictionary<TKey, TValue> variable inside the dictionary
  • Implement GetHashCode by taking an XOR of the individual key/value pair hash codes
  • Implement equality by checking whether the sizes are the same, then checking every key in "this" to see if its value is the same in the other dictionary.

So something like this:

public sealed class EquatableDictionary<TKey, TValue>
    : IDictionary<TKey, TValue>, IEquatable<ComparableDictionary<TKey, TValue>>
{
    private readonly Dictionary<TKey, TValue> dictionary;

    public override bool Equals(object other)
    {
        return Equals(other as ComparableDictionary<TKey, TValue>);
    }

    public bool Equals(ComparableDictionary<TKey, TValue> other)
    {
        if (ReferenceEquals(other, null))
        {
            return false;
        }
        if (Count != other.Count)
        {
            return false;
        }
        foreach (var pair in this)
        {
            var otherValue;
            if (!other.TryGetValue(pair.Key, out otherValue))
            {
                return false;
            }
            if (!EqualityComparer<TValue>.Default.Equals(pair.Value,
                                                         otherValue))
            {
                return false;
            }
        }
        return true;
    }

    public override int GetHashCode()
    {
        int hash = 0;
        foreach (var pair in this)
        {
            int miniHash = 17;
            miniHash = miniHash * 31 + 
                   EqualityComparer<TKey>.Default.GetHashCode(pair.Key);
            miniHash = miniHash * 31 + 
                   EqualityComparer<Value>.Default.GetHashCode(pair.Value);
            hash ^= miniHash;
        }
        return hash;
    }

    // Implementation of IDictionary<,> which just delegates to the dictionary
}

Also note that I can't remember whether EqualityComparer<T>.Default.GetHashCode copes with null values - I have a suspicion that it does, returning 0 for null. Worth checking though :)

Oui answered 29/5, 2011 at 14:32 Comment(6)
The downside of this implementation is that {a => 1, b => 2} has the same hashcode as {a => 2, b => 1}.Omnibus
@Ben: Yes - that's where a different way of hashing each key/value pair would help. Will edit to provide an example.Oui
@Ben: Have a look now and see if that looks better to you.Oui
Why do you implement IEquatable<T>? AFAIK the only use for this interface is avoiding boxing for value-types.Iroquois
@Jon: Yeah, the combination of addition with xor, because it's non-commutative, looks like it fixed that case.Omnibus
@CodeInChaos: Personally I like to implement that wherever I'm implementing equality. Aside from anything else, it shows that a type is designed to be compared via equality, rather than just (unfortunately) inheriting Equals from Object.Oui

© 2022 - 2024 — McMap. All rights reserved.