Object of custom type as dictionary key
Asked Answered
C

6

242

What must I do to use my objects of a custom type as keys in a Python dictionary (where I don't want the "object id" to act as the key) , e.g.

class MyThing:
    def __init__(self,name,location,length):
            self.name = name
            self.location = location
            self.length = length

I'd want to use MyThing's as keys that are considered the same if name and location are the same. From C#/Java I'm used to having to override and provide an equals and hashcode method, and promise not to mutate anything the hashcode depends on.

What must I do in Python to accomplish this ? Should I even ?

(In a simple case, like here, perhaps it'd be better to just place a (name,location) tuple as key - but consider I'd want the key to be an object)

Castello answered 4/2, 2011 at 18:51 Comment(3)
What's wrong with using the hash?Sunbreak
Probably because he wants two MyThing, if they have the same name and location, to index the dictionary to return the same value, even if they were created separately as two different "objects".Determinism
"perhaps it'd be better to just place a (name,location) tuple as key - but consider I'd want the key to be an object)" You mean: a NON-COMPOSITE object ?Pippy
Z
289

You need to add 2 methods, note __hash__ and __eq__:

class MyThing:
    def __init__(self,name,location,length):
        self.name = name
        self.location = location
        self.length = length

    def __hash__(self):
        return hash((self.name, self.location))

    def __eq__(self, other):
        return (self.name, self.location) == (other.name, other.location)

    def __ne__(self, other):
        # Not strictly necessary, but to avoid having both x==y and x!=y
        # True at the same time
        return not(self == other)

The Python dict documentation defines these requirements on key objects, i.e. they must be hashable.

Zymo answered 4/2, 2011 at 18:55 Comment(17)
hash(self.name) looks nicer than self.name.__hash__(), and if you do and you can do hash((x, y)) to avoid XORing yourself.Aromatic
As an additional note, I just discovered that calling x.__hash__() like that is also wrong, because it can produce incorrect results: pastebin.com/C9fSH7eFAromatic
@Rosh Oxymoron: thank you for the comment. When writing I was using explicit and for __eq__ but then I thought "why not using tuples?" because I often do that anyway (I think it's more readable). For some strange reason my eyes didn't go back to question about __hash__ however.Zymo
@Rosh Oxymoron: I'm not sure I understand 100% your second comment. Are an instance of A and one of B to compare equal or not? If yes then that __hash__ implementation is breaching the contract (if two objects compare equal they should return the same value from __hash__ - see docs.python.org/reference/datamodel.html#object.__hash__ ). If on the other side they're not going to compare equal then it's ok for an implementation to call __hash__ directly... what am I missing?Zymo
To be honest, this is the kind of thing that I don't like to hide in an object if I can help. Far more readable and maintainable to just use a tuple of name and location as the key. A dictionary is a kind of in memory database and it is natural to use tuples for compound keys.Arman
@6502: It was merely an example of how x.__hash__() is different from hash(x) and how it might lead to incorrect behaviour in exotic cases. It only tests how __hash__ behaves, so how __eq__ works is irrelevant to the example (but all my A and B objects have the same hash, so the contract is respected). The problem is that hash(a) == hash(b) but a.__hash__() != b.__hash__(), so when you use the second in an expression you get incorrect result. Here's a more detailed example, which also defines a proper __eq__: pastebin.com/F0anqN73Aromatic
@Rosh Oxymoron: I agree that __hash__() is different from hash(), but still IMO if a/b __hash__() are respecting the contract I think you can use a.__hash__()^b.__hash__() as a valid implementation. Your second example too is not respecting the rules for a valid __hash__() implementation: for example b1==a1 but b1.__hash__()!=a1.__hash__(). Ok that the system function hash adds a transformation on the value obtained from __hash__ (e.g. hash(-1)==-2), but that's irrelevant... combining __hash__s with xor is valid anyway (it will never make equal things hashing different).Zymo
WARNING if you do any other comparisons on objects, __ne__() doesn't come for free. The default implementation is akin to not is. So if you don't define it, objects can be both equal and not equal. Recommend def __ne__(self, other): return not self.__eq__(other)Metcalfe
@Zymo This give me error "Vector is not frozen (mutable), call freeze first"Changeless
@user877329: are you trying to use some blender data structure as keys? Apparently from some repos certain objects require you to "freeze" them first to avoid mutability (mutating a value-based object that has been used as a key in a python dictionary is not permitted)Zymo
@Zymo Yes I do. Adding self.uv.freeze(), and self.normal.freeze() in CTOR solves this problem.Changeless
@BobStein-VisiBone How could this happen? Can you give an example? The doc says: By default, __ne__() delegates to __eq__() and inverts the result unless it is NotImplemented.Ungraceful
@Ungraceful pythonfiddle.com/eq-method-needs-ne-method <-- this shows the "bug" in Python 2. Python 3 does not have this problem: the default __ne__() has been "fixed".Metcalfe
Suppose we only want to use self.name as the key. We should only change it in the hash function right. But in _eq_ we shouldn't remove itImbecilic
you forgot the length what Happened to the LENGTH?Imbecilic
@HaniGotc: length of what? A dictionary key doesn't need to have a length (you can use integers as key for example).Zymo
@HaniGotc: sorry re-reading now I understood what you mean... in the question however the OP is saying that two things should be equal if name and location are the same; it's not considering length.Zymo
S
39

An alternative in Python 2.6 or above is to use collections.namedtuple() -- it saves you writing any special methods:

from collections import namedtuple
MyThingBase = namedtuple("MyThingBase", ["name", "location"])
class MyThing(MyThingBase):
    def __new__(cls, name, location, length):
        obj = MyThingBase.__new__(cls, name, location)
        obj.length = length
        return obj

a = MyThing("a", "here", 10)
b = MyThing("a", "here", 20)
c = MyThing("c", "there", 10)
a == b
# True
hash(a) == hash(b)
# True
a == c
# False
Siesta answered 4/2, 2011 at 20:58 Comment(0)
R
23

You override __hash__ if you want special hash-semantics, and __cmp__ or __eq__ in order to make your class usable as a key. Objects who compare equal need to have the same hash value.

Python expects __hash__ to return an integer, returning Banana() is not recommended :)

User defined classes have __hash__ by default that calls id(self), as you noted.

There is some extra tips from the documentation.:

Classes which inherit a __hash__() method from a parent class but change the meaning of __cmp__() or __eq__() such that the hash value returned is no longer appropriate (e.g. by switching to a value-based concept of equality instead of the default identity based equality) can explicitly flag themselves as being unhashable by setting __hash__ = None in the class definition. Doing so means that not only will instances of the class raise an appropriate TypeError when a program attempts to retrieve their hash value, but they will also be correctly identified as unhashable when checking isinstance(obj, collections.Hashable) (unlike classes which define their own __hash__() to explicitly raise TypeError).

Rouse answered 4/2, 2011 at 18:55 Comment(3)
The hash alone is not enough, additionally you either need to override __eq__ or __cmp__.Watusi
@Oben Sonne: __cmp__ is given to you by Python if it is a user defined class, but you probably want to override them anyway to accommodate for new semantics.Rouse
@Skurmedel: Yes, but although you can call cmp and use = on user classes which do not override these methods, one of them must be implemented to meet the questioner's requirement that instances with similar name and location have the same dictionary key.Watusi
R
15

I noticed in python 3.8.8 (maybe ever earlier) you don't need anymore explicitly declare __eq__() and __hash__() to have to opportunity to use your own class as a key in dict.

class Apple:
    def __init__(self, weight):
        self.weight = weight
        
    def __repr__(self):
        return f'Apple({self.weight})'

apple_a = Apple(1)
apple_b = Apple(1)
apple_c = Apple(2)

apple_dictionary = {apple_a : 3, apple_b : 4, apple_c : 5}

print(apple_dictionary[apple_a])  # 3
print(apple_dictionary)  # {Apple(1): 3, Apple(1): 4, Apple(2): 5}

I assume from some time Python manages it on its own, however I can be wrong.

Risibility answered 30/7, 2021 at 17:36 Comment(1)
I found this to be true also.Waki
C
2

The answer for today as I know other people may end up here like me, is to use dataclasses in python >3.7. It has both hash and eq functions.

Clink answered 24/9, 2021 at 10:46 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Crusty
G
2

@dataclass(frozen=True) example (Python 3.7)

@dataclass had been previously mentioned at: https://mcmap.net/q/41628/-object-of-custom-type-as-dictionary-key but here's an example.

This awesome new feature, among other good things, automatically defines a __hash__ and __eq__ method for you, making it just work as usually expected in dicts and sets:

dataclass_cheat.py

from dataclasses import dataclass, FrozenInstanceError

@dataclass(frozen=True)
class MyClass1:
    n: int
    s: str

@dataclass(frozen=True)
class MyClass2:
    n: int
    my_class_1: MyClass1

d = {}
d[MyClass1(n=1, s='a')] = 1
d[MyClass1(n=2, s='a')] = 2
d[MyClass1(n=2, s='b')] = 3
d[MyClass2(n=1, my_class_1=MyClass1(n=1, s='a'))] = 4
d[MyClass2(n=2, my_class_1=MyClass1(n=1, s='a'))] = 5
d[MyClass2(n=2, my_class_1=MyClass1(n=2, s='a'))] = 6

assert d[MyClass1(n=1, s='a')] == 1
assert d[MyClass1(n=2, s='a')] == 2
assert d[MyClass1(n=2, s='b')] == 3
assert d[MyClass2(n=1, my_class_1=MyClass1(n=1, s='a'))] == 4
assert d[MyClass2(n=2, my_class_1=MyClass1(n=1, s='a'))] == 5
assert d[MyClass2(n=2, my_class_1=MyClass1(n=2, s='a'))] == 6

# Due to `frozen=True`
o = MyClass1(n=1, s='a')
try:
    o.n = 2
except FrozenInstanceError as e:
    pass
else:
    raise 'error'

As we can see in this example, the hashes are being calculated based on the contents of the objects, and not simply on the addresses of instances. This is why something like:

d = {}
d[MyClass1(n=1, s='a')] = 1
assert d[MyClass1(n=1, s='a')] == 1

works even though the second MyClass1(n=1, s='a') is a completely different instance from the first with a different address.

frozen=True is mandatory, otherwise the class is not hashable, otherwise it would make it possible for users to inadvertently make containers inconsistent by modifying objects after they are used as keys. Further documentation: https://docs.python.org/3/library/dataclasses.html

Tested on Python 3.10.7, Ubuntu 22.10.

Gingerich answered 2/2, 2023 at 14:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.