How can I make a python dataclass hashable?
Asked Answered
E

4

158

I have a dataclass whose instances I want to hash and order, using the id member as a key.

from dataclasses import dataclass, field

@dataclass(eq=True, order=True)
class Category:
    id: str = field(compare=True)
    name: str = field(default="set this in post_init", compare=False)

I know that I can implement __hash__ myself. However, I would like dataclasses to do the work for me because they are intended to handle this.


Unfortunately, the above dataclass fails:

a = sorted(list(set([ Category(id='x'), Category(id='y')])))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Category'
Enrich answered 18/9, 2018 at 16:3 Comment(1)
To find examples, see the What you can turn on section in this post https://mcmap.net/q/80143/-what-are-data-classes-and-how-are-they-different-from-common-classes.Globeflower
O
163

From the docs:

Here are the rules governing implicit creation of a __hash__() method:

[...]

If eq and frozen are both true, by default dataclass() will generate a __hash__() method for you. If eq is true and frozen is false, __hash__() will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, __hash__() will be left untouched meaning the __hash__() method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).

Since you set eq=True and left frozen at the default (False), your dataclass is unhashable.

You have 3 options:

  • Set frozen=True (in combination with the default eq=True), which will make your class immutable and hashable.
    @dataclass(frozen=True)
    
  • Set unsafe_hash=True, which will create a __hash__ method but leave your class mutable.
    @dataclass(unsafe_hash=True)
    
    Mutability risks problems if an instance of your class is modified while stored in a dict or set:
    cat = Category('foo', 'bar')
    categories = {cat}
    cat.id = 'baz'
    
    print(cat in categories)  # False
    
  • Manually implement a __hash__ method.
Ossified answered 18/9, 2018 at 16:14 Comment(2)
As noted below, to exclude some field from being used for hash generation in unsafe_hash, you can use field(compare=False) or field(hash=False) (hash inherits compare value if not set.).Algetic
Note that manually implementing __hash__() is trivial in case of an ID-type field: def __hash__(self): return hash(self.id)Joly
F
41

TL;DR

Use frozen=True (which will make the instances immutable). eq=True is the default value and can be left out.

Long Answer

From the docs:

__hash__() is used by built-in hash(), and when objects are added to hashed collections such as dictionaries and sets. Having a __hash__() implies that instances of the class are immutable. Mutability is a complicated property that depends on the programmer’s intent, the existence and behavior of __eq__(), and the values of the eq and frozen flags in the dataclass() decorator.

By default, dataclass() will not implicitly add a __hash__() method unless it is safe to do so. Neither will it add or change an existing explicitly defined __hash__() method. Setting the class attribute __hash__ = None has a specific meaning to Python, as described in the __hash__() documentation.

If __hash__() is not explicit defined, or if it is set to None, then dataclass() may add an implicit __hash__() method. Although not recommended, you can force dataclass() to create a __hash__() method with unsafe_hash=True. This might be the case if your class is logically immutable but can nonetheless be mutated. This is a specialized use case and should be considered carefully.

Here are the rules governing implicit creation of a __hash__() method. Note that you cannot both have an explicit __hash__() method in your dataclass and set unsafe_hash=True; this will result in a TypeError.

If eq and frozen are both true, by default dataclass() will generate a __hash__() method for you. If eq is true and frozen is false, __hash__() will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, __hash__() will be left untouched meaning the __hash__() method of the superclass will be used (if the superclass is object, this means it will fall back to id-based hashing).

Fidelity answered 18/9, 2018 at 16:14 Comment(0)
A
41

I'd like to add a special note for use of unsafe_hash.

You can exclude fields from being compared by hash by setting compare=False, or hash=False. (hash by default inherits from compare).

This might be useful if you store nodes in a graph but want to mark them visited without breaking their hashing (e.g if they're in a set of unvisited nodes..).

from dataclasses import dataclass, field
@dataclass(unsafe_hash=True)
class node:
    x:int
    visit_count: int = field(default=10, compare=False)  # hash inherits compare setting. So valid.
    # visit_count: int = field(default=False, hash=False)   # also valid. Arguably easier to read, but can break some compare code.
    # visit_count: int = False   # if mutated, hashing breaks. (3* printed)

s = set()
n = node(1)
s.add(n)
if n in s: print("1* n in s")
n.visit_count = 11
if n in s:
    print("2* n still in s")
else:
    print("3* n is lost to the void because hashing broke.")

This took me hours to figure out... Useful further readings I found is the python doc on dataclasses. Specifically see the field documentation and dataclass arg documentations. https://docs.python.org/3/library/dataclasses.html

Algetic answered 29/7, 2019 at 3:16 Comment(1)
This might be useful if you store nodes in a graph but want to mark them visited without breaking their hashing (e.g if they're in a set of unvisited nodes..).: never have I felt more targeted by an example use case.Selfexcited
K
0

Use:

@dataclass(frozen=True, order=True)
class Category:

True is the default for eq.

Kishke answered 25/9, 2023 at 17:31 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.