What makes a user-defined class unhashable?
Asked Answered
L

5

70

The docs say that a class is hashable as long as it defines __hash__ method and __eq__ method. However:

class X(list):
  # read-only interface of `tuple` and `list` should be the same, so reuse tuple.__hash__
  __hash__ = tuple.__hash__

x1 = X()
s = {x1} # TypeError: unhashable type: 'X'

What makes X unhashable?

Note that I must have identical lists (in terms of regular equality) to be hashed to the same value; otherwise, I will violate this requirement on hash functions:

The only required property is that objects which compare equal have the same hash value

The docs do warn that a hashable object shouldn't be modified during its lifetime, and of course I don't modify instances of X after creation. Of course, the interpreter won't check that anyway.

Lundquist answered 20/4, 2012 at 22:58 Comment(2)
Yeah, the read-only interfaces are the same, but why do you expect tuple.__hash__ to only use the external interfaces of it's own class? Especially when written in C. Using the external interfaces would be much slower. You can't reasonably expect a method from class A to work for class B unless class B is subclassed from class A. Did you even try to call x1.__hash__() too see if it worked?Bemuse
@LennartRegebro Yes, I agree... See my last comment to https://mcmap.net/q/278475/-what-makes-a-user-defined-class-unhashable... I just had a brain freeze.Lundquist
B
54

Simply setting the __hash__ method to that of the tuple class is not enough. You haven't actually told it how to hash any differently. tuples are hashable because they are immutable. If you really wanted to make you specific example work, it might be like this:

class X2(list):
    def __hash__(self):
        return hash(tuple(self))

In this case you are actually defining how to hash your custom list subclass. You just have to define exactly how it can generate a hash. You can hash on whatever you want, as opposed to using the tuple's hashing method:

def __hash__(self):
    return hash("foobar"*len(self))
Burl answered 20/4, 2012 at 23:4 Comment(6)
But isn't tuple.__hash__ a function that takes a tuple, and returns a number? How does that function "notice" that my object is actually a list rather than a tuple - the read API for the two types is identical.Lundquist
@max: tuple.__hash__ is a bound method of the tuple class. You aren't changing whatever its implementation is doing inside that method to hash. Define your own.Burl
hash((1,2,3)) is the same as (1,2,3).__hash__; that is the same as tuple.__hash__((1,2,3)), right? So tuple.__hash__ relies on the non-public API of class tuple, and so it breaks down with a confusing error message when passed an instance of a different class that matches tuple's public API? I suppose it explains it.. but a bit unexpected.`Lundquist
@max: Ultimately the hashing procedure is defined in the tuple classes __hash__, and without looking at the source I can only assume its specifically intended for the internals of a tuple instance. I am in no way surprised that simply passing its method reference over to your list class didn't work as expected.Burl
@Lundquist Methods usually depend on the internals of a class. Do you really expect to be able to take the implementation of a method on class A and apply it to an object of class B, just because of some similarity in the public API of both classes? The fact that tuple and list are built-in classes implemented in C makes this even less likely to work; at Python level if B objects have all the attributes needed by the method you got from A, then this could work, but at the C level we're talking structs, arrays, and pointers.Salvucci
@Salvucci I agree.. I don't know what I was thinkingLundquist
M
29

From the Python3 docs:

If a class does not define an __eq__() method it should not define a __hash__() operation either; if it defines __eq__() but not __hash__(), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

Ref: object.__hash__(self)

Sample code:

class Hashable:
    pass

class Unhashable:
    def __eq__(self, other):
        return (self == other)

class HashableAgain:
    def __eq__(self, other):
        return (self == other)

    def __hash__(self):
        return id(self)

def main():
    # OK
    print(hash(Hashable()))
    # Throws: TypeError("unhashable type: 'X'",)
    print(hash(Unhashable()))  
    # OK
    print(hash(HashableAgain()))
Midtown answered 3/4, 2015 at 14:29 Comment(2)
Does __hash__ need to be unique? Suppose you wanted the instances of HashableAgain to be compared based on the criteria you've defined in __eq__ can you just return a constant integer in __hash__? (I don't really understand how hash) is used in deciding an object's membership in a set.Director
@MinhTran: In general, hash in not unique, but relatively unique. It is used to bucket values in a map. If you use a constant value for hash, all the values will appear in same bucket, so performance will be horrible... but it should still work!Midtown
B
9

What you could and should do, based on your other question, is: don't subclass anything, just encapsulate a tuple. It's perfectly fine to do so in the init.

class X(object):
    def __init__(self, *args):
        self.tpl = args
    def __hash__(self):
        return hash(self.tpl)
    def __eq__(self, other):
        return self.tpl == other
    def __repr__(self):
        return repr(self.tpl)

x1 = X()
s = {x1}

which yields:

>>> s
set([()])
>>> x1
()
Birdcage answered 20/4, 2012 at 23:33 Comment(1)
You're right, for many use cases this is the cleanest, simplest solution; +1Gazelle
A
8

An addition to the above answers - For the specific case of a dataclass in python3.7+ - to make a dataclass hashable, you can use

@dataclass(frozen=True)
class YourClass:
    pass

as the decoration instead of

@dataclass
class YourClass:
    pass
Aluminium answered 4/3, 2021 at 16:26 Comment(0)
G
3

If you don't modify instances of X after creation, why aren't you subclassing tuple?

But I'll point out that this actually doesn't throw an error, at least in Python 2.6.

>>> class X(list):
...     __hash__ = tuple.__hash__
...     __eq__ = tuple.__eq__
... 
>>> x = X()
>>> s = set((x,))
>>> s
set([[]])

I hesitate to say "works" because this doesn't do what you think it does.

>>> a = X()
>>> b = X((5,))
>>> hash(a)
4299954584
>>> hash(b)
4299954672
>>> id(a)
4299954584
>>> id(b)
4299954672

It's just using the object id as a hash. When you actually call __hash__ you still get an error; likewise for __eq__.

>>> a.__hash__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor '__hash__' for 'tuple' objects doesn't apply to 'X' object
>>> X().__eq__(X())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor '__eq__' for 'tuple' objects doesn't apply to 'X' object

I gather that the python internals, for some reason, are detecting that X has a __hash__ and an __eq__ method, but aren't calling them.

The moral of all this is: just write a real hash function. Since this is a sequence object, converting it to a tuple and hashing that is the most obvious approach.

def __hash__(self):
    return hash(tuple(self))
Gazelle answered 20/4, 2012 at 23:12 Comment(5)
I'm very sorry, this question is taken out of context of another one. I just was confused about this particular behavior. The reason I subclass list is a bit complicated (see discussion in comments to this question).Lundquist
The code doesn't work for me in ActiveState Python 3.2. Perhaps the behavior changed recently?Lundquist
I'm using Python 2.6. In any case, you don't want this behavior, because using ids as keys isn't really a good idea. Better to just convert to tuple and hash that. And actually -- I'm sorry; this was just a rather perplexing approach to the problem for me.Gazelle
In Python 3 the tuple hash indeed makes a hash of the tuple objects, not just the tuples ID, if I understand the code correctly.Bemuse
@LennartRegebro, I think that must be true in Python 2 as well; or at least I am able to create two tuples with different ids that evaluate as equal and have the same hash. The behavior I'm describing here applies only to X objects as defined above.Gazelle

© 2022 - 2024 — McMap. All rights reserved.