I need to use Dictionary<long, string>
collections that given two instances d1
and d2
where they each have the same KeyValuePair<long, string>
contents, which could be inserted in any order:
(d1 == d2)
evaluates totrue
d1.GetHashCode()
==d2.GetHashCode()
The first requirement was achieved most easily by using a SortedDictionary
instead of a regular Dictionary
.
The second requirement is necessary because I have one point where I need to store Dictionary<Dictionary<long, string>, List<string>
- the main Dictionary
type is used as the key for another Dictionary
, and if the HashCodes don't evaluate based on identical contents, the using ContainsKey()
will not work the way that I want (ie: if there is already an item inserted into the dictionary with d1
as its key, then dictionary.ContainsKey(d2)
should evaluate to true
.
To achieve this, I have created a new object class ComparableDictionary : SortedDictionary<long, string>
, and have included the following:
public override int GetHashCode() {
StringBuilder str = new StringBuilder();
foreach (var item in this) {
str.Append(item.Key);
str.Append("_");
str.Append(item.Value);
str.Append("%%");
}
return str.ToString().GetHashCode();
}
In my unit testing, this meets the criteria for both equality and hashcodes. However, in reading Guidelines and Rules for GetHashCode, I came across the following:
Rule: the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable
It is permissible, though dangerous, to make an object whose hash code value can mutate as the fields of the object mutate. If you have such an object and you put it in a hash table then the code which mutates the object and the code which maintains the hash table are required to have some agreed-upon protocol that ensures that the object is not mutated while it is in the hash table. What that protocol looks like is up to you.
If an object's hash code can mutate while it is in the hash table then clearly the Contains method stops working. You put the object in bucket #5, you mutate it, and when you ask the set whether it contains the mutated object, it looks in bucket #74 and doesn't find it.
Remember, objects can be put into hash tables in ways that you didn't expect. A lot of the LINQ sequence operators use hash tables internally. Don't go dangerously mutating objects while enumerating a LINQ query that returns them!
Now, the Dictionary<ComparableDictionary, List<String>>
is used only once in code, in a place where the contents of all ComparableDictionary
collections should be set. Thus, according to these guidelines, I think that it would be acceptable to override GetHashCode
as I have done (basing it completely on the contents of the dictionary).
After that introduction my questions are:
- I know that the performance of
SortedDictionary
is very poor compared toDictionary
(and I can have hundreds of object instantiations). The only reason for usingSortedDictionary
is so that I can have the equality comparison work based on the contents of the dictionary, regardless of order of insertion. Is there a better way to achieve this equality requirement without having to use aSortedDictionary
? - Is my implementation of
GetHashCode
acceptable based on the requirements? Even though it is based on mutable contents, I don't think that that should pose any risk, since the only place where it is using (I think) is after the contents have been set.
Note: while I have been setting these up using Dictionary
or SortedDictionary
, I am not wedded to these collection types. The main need is a collection that can store pairs of values, and meet the equality and hashing requirements defined out above.
{a => 1, b => 2}
has the same hashcode as{a => 2, b => 1}
. – Omnibus