How to handle hash collisions for Dictionaries in Swift
Asked Answered
U

4

20

TLDR

My custom structure implements the Hashable Protocol. However, when hash collisions occur while inserting keys in a Dictionary, they are not automatically handled. How do I overcome this problem?

Background

I had previously asked this question How to implement the Hashable Protocol in Swift for an Int array (a custom string struct). Later I added my own answer, which seemed to be working.

However, recently I have detected a subtle problem with hashValue collisions when using a Dictionary.

Most basic example

I have simplified the code down as far as I can to the following example.

Custom structure

struct MyStructure: Hashable {

    var id: Int

    init(id: Int) {
        self.id = id
    }

    var hashValue: Int {
        get {
            // contrived to produce a hashValue collision for id=1 and id=2
            if id == 1 {
                return 2 
            }
            return id
        }
    }
}

func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
    return lhs.hashValue == rhs.hashValue
}

Note the global function to overload the equality operator (==) in order to conform to the Equatable Protocol, which is required by the Hashable Protocol.

Subtle Dictionary key problem

If I create a Dictionary with MyStructure as the key

var dictionary = [MyStructure : String]()

let ok = MyStructure(id: 0)            // hashValue = 0
let collision1 = MyStructure(id: 1)    // hashValue = 2
let collision2 = MyStructure(id: 2)    // hashValue = 2

dictionary[ok] = "some text"
dictionary[collision1] = "other text"
dictionary[collision2] = "more text"

print(dictionary) // [MyStructure(id: 2): more text, MyStructure(id: 0): some text]
print(dictionary.count) // 2

the equal hash values cause the collision1 key to be overwritten by the collision2 key. There is no warning. If such a collision only happened once in a dictionary with 100 keys, then it could easily be missed. (It took me quite a while to notice this problem.)

Obvious problem with Dictionary literal

If I repeat this with a dictionary literal, though, the problem becomes much more obvious because a fatal error is thrown.

let ok = MyStructure(id: 0)            // hashValue = 0
let collision1 = MyStructure(id: 1)    // hashValue = 2
let collision2 = MyStructure(id: 2)    // hashValue = 2

let dictionaryLiteral = [
    ok : "some text",
    collision1 : "other text",
    collision2 : "more text"
]
// fatal error: Dictionary literal contains duplicate keys

Question

I was under the impression that it was not necessary for hashValue to always return a unique value. For example, Mattt Thompson says,

One of the most common misconceptions about implementing a custom hash function comes from ... thinking that hash values must be distinct.

And the respected SO user @Gaffa says that one way to handle hash collisions is to

Consider hash codes to be non-unique, and use an equality comparer for the actual data to determine uniqueness.

In my opinion, the question Do swift hashable protocol hash functions need to return unique values? has not been adequately answered at the time of this writing.

After reading the Swift Dictionary question How are hash collisions handled?, I assumed that Swift automatically handled hash collisions with Dictionary. But apparently that is not true if I am using a custom class or structure.

This comment makes me think the answer is in how the Equatable protocol is implemented, but I am not sure how I should change it.

func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
    return lhs.hashValue == rhs.hashValue
}

Is this function called for every dictionary key lookup or only when there is a hash collision? (Update: see this question)

What should I do to determine uniqueness when (and only when) a hash collision occurs?

Unapt answered 27/7, 2015 at 22:8 Comment(3)
Swift Comparison Protocols is also a useful link.Unapt
You returning 1 & 2 for hashValues is just for demonstrating a collision. right?Bodily
My returning 2 and 2 for hashValues is a demonstration of a collision. (The ids are 1 and 2.) @HoneyUnapt
L
16
func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
    return lhs.hashValue == rhs.hashValue
}

Note the global function to overload the equality operator (==) in order to conform to the Equatable Protocol, which is required by the Hashable Protocol.

Your problem is an incorrect equality implementation.

A hash table (such as a Swift Dictionary or Set) requires separate equality and hash implementations.

hash gets you close to the object you're looking for; equality gets you the exact object you're looking for.

Your code uses the same implementation for hash and equality, and this will guarantee a collision.

To fix the problem, implement equality to match exact object values (however your model defines equality). E.g.:

func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
    return lhs.id == rhs.id
}
Lysippus answered 28/7, 2015 at 3:8 Comment(6)
I'm super confused. From the 2 code snippets that you have, 1st implementation of Hashable and the other implementation of equatable. In order to conform to hashable, do I need to implement both of them? If I do then then how does the compiler know which to choose between them, assuming that they both have the same method signature?!Bodily
I just tried writing another == for equality and I get invalid redeclaration of == @UnaptBodily
@Honey, the first code snippet was Darren quoting my incorrect code. The second code snippet is Darren's correct version. The Hashable protocol requires the Equitable protocol to resolve any situations in which there are hash collisions (ie, any time two different objects have the same hash value). I see from your own answer you have already figured this out, though.Unapt
@Honey, I assume you have seen this answer, too. It is what I wrote after working through all of this. The Big Picture section shows all the pieces that are needed for for the Hashable protocol.Unapt
@Honey, my previous statement about the purpose of the Equatable protocol seems to be wrong. See this question.Unapt
Dear reader, if this answer confuses you, I suggest a quick look at how hashtables (can) work. The key remains implicit here; hashValue is the hash f(key), and == is used to find the correct entry among all entries with the same hash.Hose
F
5

I think you have all the pieces of the puzzle you need -- you just need to put them together. You have a bunch of great sources.

Hash collisions are okay. If a hash collision occurs, objects will be checked for equality instead (only against the objects with matching hashes). For this reason, objects' Equatable conformance needs to be based on something other than hashValue, unless you are certain that hashes cannot collide.

This is the exact reason that objects that conform to Hashable must also conform to Equatable. Swift needs a more domain-specific comparison method for when hashing doesn't cut it.

In that same NSHipster article, you can see how Mattt implements isEqual: versus hash in his example Person class. Specifically, he has an isEqualToPerson: method that checks against other properties of a person (birthdate, full name) to determine equality.

- (BOOL)isEqualToPerson:(Person *)person {
  if (!person) {
    return NO;
  }

  BOOL haveEqualNames = (!self.name && !person.name) || [self.name isEqualToString:person.name];
  BOOL haveEqualBirthdays = (!self.birthday && !person.birthday) || [self.birthday isEqualToDate:person.birthday];

  return haveEqualNames && haveEqualBirthdays;
}

He does not use a hash value when checking for equality - he uses properties specific to his person class.

Likewise, Swift does not let you simply use a Hashable object as a dictionary key -- implicitly, by protocol inheritance -- keys must conform to Equatable as well. For standard library Swift types this has already been taken care of, but for your custom types and class, you must create your own == implementation. This is why Swift does not automatically handle dictionary collisions with custom types - you must implement Equatable yourself!

As a parting thought, Mattt also states that you can often just do an identity check to make sure your two objects are at different memory address, and thus different objects. In Swift, that would like like this:

if person1 === person2 {
    // ...
}

There is no guarantee here that person1 and person2 have different properties, just that they occupy separate space in memory. Conversely, in the earlier isEqualToPerson: method, there is no guarantee that two people with the same names and birthdates are actually same people. Thus, you have to consider what makes sense for you specific object type. Again, another reason that Swift does not implement Equatable for you on custom types.

Fusiform answered 27/7, 2015 at 23:58 Comment(2)
You said, "If a hash collision occurs, objects will be checked for equality instead (only against the objects with matching hashes)." While I was testing this out I am noticing some strange behavior that seems to go against what I would expect. See this question that I just made.Unapt
@Unapt In theory an object would only need to be tested against colliding objects, but the Swift devs are free to implement their Dictionary type however they like, as long as it functions properly for an end user. They make no guarantees about how many times keys will be checked for equality, and thus they are free to optimize the Dictionary type however they see fit. They may have some crazy optimizations that really improve Dictionary's performance but that internally operate differently than the basic hash-map model. The only thing that matters is that the type's APIs work as documented.Fusiform
T
3

the equal hash values cause the collision1 key to be overwritten by the collision2 key. There is no warning. If such a collision only happened once in a dictionary with 100 keys, then it could easily be missed.

Hash collision has nothing to do with it. (Hash collisions never affect the result, only the performance.) It is working exactly as documented.

Dictionary operations work on equality (==) of keys. Dictionaries do not contain duplicate keys (meaning keys that are equal). When you set a value with a key, it overwrites any entry containing an equal key. When you get an entry with a subscript, it finds a value with a key that is equal to, not necessarily the same as, the thing you gave. And so on.

collision1 and collision2 are equal (==), based on the way you defined the == operator. Therefore, setting an entry with key collision2 must overwrite any entry with key collision1.

P.S. The same exact thing applies with dictionaries in other languages. For example, in Cocoa, NSDictionary does not allow duplicate keys, i.e. keys that are isEqual:. In Java, Maps do not allow duplicate keys, i.e. keys that are .equals().

Trimer answered 28/7, 2015 at 3:25 Comment(0)
B
1

You can see my comments on this page's answers and this answer. I think all answers are still written in a VERY confusing way.

tl;dr 0) you don't need to write the implementation isEqual ie == between the hashValues. 1) Only provide/return hashValue. 2) just implement Equatable as you normally would

0) To conform to hashable you must have a computed value named hashValue and give it an appropriate value. Unlike equatable protocol, the comparison of hashValues is already there. You DON'T need to write:

func ==(lhs: MyStructure, rhs: MyStructure) -> Bool {
    return lhs.hashValue == rhs.hashValue
    // Snippet A
}

1) Then it uses the hashValue to check to see if for that hashValue's index (calculated by its modulo against the array's count) the key being looked exists or not. It looks within the array of key/value pairs of that index.

2) However as a fail safe ie in case there are matching hashes you fall back to the regular == func. (Logically you need it because of a fail safe. But you also you need it because Hashable protocol conforms to Equatable and therefore you must write an implementation for ==. Otherwise you would get a compiler error)

func == (lhs: MyStructure, rhs: MyStructure) -> Bool {
    return lhs.id == rhs.id
    //Snippet B
}

Conclusion:

You must include Snippet B, exclude Snippet A, while also having a hashValue property

Bodily answered 18/11, 2016 at 20:21 Comment(5)
Point 1 makes no sense. hashValue is NOT used to check the equality between two objects. It is perfectly valid for two unequal objects to have the same hash value. The requirement is only that two equal objects must have the same hash valueStgermain
@Stgermain Thanks for the feedback. I actually recently brushed up my data structures skills a bit. Is it better now?Bodily
What array are you talking about?Stgermain
From what I've learned from here. Dictionaries are stored in an 2 dimension array. Where each key is appended to the inner array at the outer array's index of key's hashValue % array.count. I guess it's not actually stored in an array, but that's what you're asked to do at an interview...Bodily
That's an implementation detail. There's no guarantee that Swift Dictionary is implemented exactly like that. And it's irrelevant to this question and doesn't need to be dealt with in any answer.Stgermain

© 2022 - 2024 — McMap. All rights reserved.