Python - class __hash__ method and set [duplicate]
Asked Answered
B

2

26

I'm using set() and __hash__ method of python class to prevent adding same hash object in set. According to python data-model document, set() consider same hash object as same object and just add them once.

But it behaves different as below:

class MyClass(object):

    def __hash__(self):
        return 0

result = set()
result.add(MyClass())
result.add(MyClass())

print(len(result)) # len = 2

While in case of string value, it works correctly.

result.add('aida')
result.add('aida')

print(len(result)) # len = 1

My question is: why the same hash objects are not same in set?

Biafra answered 18/7, 2016 at 6:54 Comment(0)
T
37

Your reading is incorrect. The __eq__ method is used for equality checks. The documents just state that the __hash__ value must also be the same for 2 objects a and b for which a == b (i.e. a.__eq__(b)) is true.

This is a common logic mistake: a == b being true implies that hash(a) == hash(b) is also true. However, an implication does not necessarily mean equivalence, that in addition to the prior, hash(a) == hash(b) would mean that a == b.

To make all instances of MyClass compare equal to each other, you need to provide an __eq__ method for them; otherwise Python will compare their identities instead. This might do:

class MyClass(object):
    def __hash__(self):
        return 0
    def __eq__(self, other):
        # another object is equal to self, iff 
        # it is an instance of MyClass
        return isinstance(other, MyClass)

Now:

>>> result = set()
>>> result.add(MyClass())
>>> result.add(MyClass())
1

In reality you'd base the __hash__ on those properties of your object that are used for __eq__ comparison, for example:

class Person
    def __init__(self, name, ssn):
        self.name = name
        self.ssn = ssn

    def __eq__(self, other):
        return isinstance(other, Person) and self.ssn == other.ssn

    def __hash__(self):
        # use the hashcode of self.ssn since that is used
        # for equality checks as well
        return hash(self.ssn)

p = Person('Foo Bar', 123456789)
q = Person('Fake Name', 123456789)
print(len({p, q})  # 1
Thunderstone answered 18/7, 2016 at 6:59 Comment(2)
Is there any way to pick which object remains in the set? As in prefer 'Foo Bar' over 'Fake Name'?Marks
@alliefitter no. not with hash and eqPalestrina
H
16

Sets need two methods to make an object hashable: __hash__ and __eq__. Two instances must return the same hash value when they are considered equal. An instance is considered already present in a set if both the hash is present in the set and the instance is considered equal to one of the instances with that same hash in the set.

Your class doesn't implement __eq__, so the default object.__eq__ is used instead, which only returns true if obj1 is obj2 is also true. In other words, two instances are only considered equal if they are the exact same instance.

Just because their hashes match, doesn't make them unique as far as a set is concerned; even objects with different hashes can end up in the same hash table slot, as the modulus of the hash against the table size is used.

Add your a custom __eq__ method that returns True when two instances are supposed to be equal:

def __eq__(self, other):
    if not isinstance(other, type(self)):
        return False
    # all instances of this class are considered equal to one another
    return True
Has answered 18/7, 2016 at 6:59 Comment(2)
Curious why it needs both methods, is there some performance benefit where it checks hash first before eq?Rosalinarosalind
@Rosalinarosalind sets are implemented as hash tables. And yes, using a hash table means set membership tests take O(1) constant time versus O(N) for lists, so it’s a performance choice.Has

© 2022 - 2024 — McMap. All rights reserved.