Understanding python object membership for sets
Asked Answered
S

4

5

If I understand correctly, the __cmp__() function of an object is called in order to evaluate all objects in a collection while determining whether an object is a member, or 'in', the collection. However, this does not seem to be the case for sets:

class MyObject(object):
    def __init__(self, data):
        self.data = data

    def __cmp__(self, other):
        return self.data-other.data

a = MyObject(5)
b = MyObject(5)

print a in [b]          //evaluates to True, as I'd expect
print a in set([b])     //evaluates to False

How is an object membership tested in a set, then?

Solicitous answered 9/9, 2010 at 18:30 Comment(0)
T
2
>>> xs = []
>>> set([xs])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

There you are. Sets use hashes, very similar to dicts. This help performance extremely (membership tests are O(1), and many other operations depend on membership tests), and it also fits the semantics of sets well: Set items must be unique, and different items will produce different hashes, while same hashes indicate (well, in theory) duplicates.

Since the default __hash__ is just id (which is rather stupid imho), two instances of a class that inherits object's __hash__ will never hash to the same value (well, unless adress space is larger than the sizeof the hash).

Tonytonya answered 9/9, 2010 at 18:42 Comment(0)
H
5

Adding a __hash__ method to your class yields this:

class MyObject(object):
    def __init__(self, data):
        self.data = data

    def __cmp__(self, other):
        return self.data - other.data

    def __hash__(self):
        return hash(self.data)


a = MyObject(5)
b = MyObject(5)

print a in [b] # True
print a in set([b]) # Also True!
Hypermeter answered 9/9, 2010 at 18:45 Comment(1)
// Also - for spotting my non-python-native commenting habits.Solicitous
T
2
>>> xs = []
>>> set([xs])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

There you are. Sets use hashes, very similar to dicts. This help performance extremely (membership tests are O(1), and many other operations depend on membership tests), and it also fits the semantics of sets well: Set items must be unique, and different items will produce different hashes, while same hashes indicate (well, in theory) duplicates.

Since the default __hash__ is just id (which is rather stupid imho), two instances of a class that inherits object's __hash__ will never hash to the same value (well, unless adress space is larger than the sizeof the hash).

Tonytonya answered 9/9, 2010 at 18:42 Comment(0)
T
1

As others pointed, your objects don't have a __hash__ so they use the default id as a hash, and you can override it as Nathon suggested, BUT read the docs about __hash__, specifically the points about when you should and should not do that.

Totalizer answered 9/9, 2010 at 18:57 Comment(0)
L
0

A set uses a dict behind the scenes, so the "in" statement is checking whether the object exists as a key in the dict. Since your object doesn't implement a hash function, the default hash function for objects uses the object's id. So even though a and b are equivalent, they're not the same object, and that's what's being tested.

Larva answered 9/9, 2010 at 18:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.