The default hash for user-defined classes is to just return their id. This gives a behaviour that is often useful; using an instance of a user-defined class as a dictionary key will allow the associated value to be retrieved when exactly the same object is provided again to lookup the value. e.g:
>>> class Foo(object):
def __init__(self, foo):
self.foo = foo
>>> f = Foo(10)
>>> d = {f: 10}
>>> d[f]
10
This matches the default equality of user-defined classes:
>>> g = Foo(10)
>>> f == g
False
>>> d[g]
Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
d[g]
KeyError: <__main__.Foo object at 0x0000000002D69390>
Note that even though f
and g
have the same values for their attributes, they are not equal and looking up g
in d
doesn't find the value stored under f
. Furthermore, even if we change the value of f.foo
, looking up f
in d
still finds the value:
>>> f.foo = 11
>>> d[f]
10
The assumption is that instances of some arbitrary new class should be treated as non-equivalent, unless the programmer specifically declares the conditions for two instances to be treated as equivalent by defining __eq__
and __hash__
.
And this pretty much works; if I define a Car
class, I probably consider two cars with identical attributes to be representing two different cars. If I have a dictionary mapping cars to registered owners, I don't want to find Alice when I look up Bob's car, even if Alice and Bob happen to own identical cars! OTOH, if I define a class to represent postal codes, I probably do want to consider two different objects with the same code to be interchangeable representations of "the same" thing, and in this case if I had a dictionary mapping postal codes to states, I would clearly want to be able to find the same state with two different objects representing the same post code.
I refer to this as the difference between "value types" and "object types". Value types represent some value, and it's the value I care about, not each individual object's identity. Two different ways of coming up with the same value are equally good, and the "contract" of code passing around value types usually just promises to give you an object with some value, without specifying which particular object it is. For object types OTOH, each individual instance has its own identity, even if it contains exactly the same data as another instance. The "contract" of code passing around object types usually promises to keep track of the exact individual objects.
So why don't the built-in mutable classes use their id as their hash? It's because they're all containers, and we usually consider containers to be mostly like value types, with their value determined by the contained elements:
>>> [1, 2, 3] == [1, 2, 3]
True
>>> {f: 10} == {f: 10}
True
But mutable containers have a value that is transient. Some given list currently has the value [1, 2, 3]
, but it can be mutated into having the value [4, 5, 6]
. If you could use lists as dictionary keys, then we'd have to make a ruling on whether lookup should use the list's (current) value, or its identity. Either way we can be (very) surprised when the value of an object currently being used as a dictionary key is changed by mutating it. Using objects as dictionary keys only works well when the object's value is its identity, or when an object's identity is irrelevant to its value. So the answer chosen by Python is to declare mutable containers unhashable.
Now, more specific details in answer to your direct questions:
1) Since this default hash in CPython (though apparently only < 2.6, according to other answers/comments) maps to the object's memory address, then in CPython no two objects using default hashing that are both live at the same time can possibly clash on their hash values, regardless of the classes involved (and if it's being stored as a dictionary key it's live). I would also expect that other Python implementations that don't use memory addresses as hashes should still have fine hash distributions among objects using the default hashing. So yes, you can rely on it.
2) So long as you don't return as your custom hash a result that is exactly the hash of some existing object, you should be relatively fine. My understanding is that Python's hash-based containers are relatively tolerant of sub-optimal hash functions, so long as they're not completely degenerate.
__hash__
, override__eq__
as well." – OrientalUser-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).
docs.python.org/3/reference/datamodel.html#object.__hash__ – Paolapaolina