Custom comparison functions for built-in types in Python
Asked Answered
L

1

8

I am using Python's built-in sets to hold objects of a class I have defined. For this class, I defined __eq__, __ne__, and __hash__ so that I can compare objects by my custom comparison functions. That works just fine, until I find out that I actually need two sets of comparison functions, which will be used in different ways at different times in my code.

I can't define two sets of __eq__, etc. methods in my class, and Python's built-in set type does not accept a comparator argument. I suppose I could go write a wrapper class around set, but that seems like a lot more work than necessary.

Is there any easier solution to this than writing my own set class?

Laris answered 30/4, 2013 at 18:43 Comment(4)
You cannot store objects in sets that are equal only some of the time. They are either equal or they are not equal, as far as the set is concerned.Stillhunt
The usual way around this is to "decorate" the elements that you put in the list: Store wrapper elements that are equal according to your alternate comparison element. This is a bit hard to explain in a comment, but I can write up an answer if that sounds promising.Ila
Instead of writing a wrapper around set, you could write a wrapper around your own class, which provides the alternative equality definition. Then have one set of objects of your original class and another set of these "alternate" objects when you need that other equality definition.Insurgence
Also, keep in mind that sets care about hash stability as well as equality. If a == b (using the definition of equality that you want to use with the set), hash(a) == hash(b) must be true, or all kinds of things will break.Ila
I
19

Let's say you have this class:

class Thingy(object):
    def __init__(self, key, notkey):
        self.key, self.notkey = key, notkey
    def __eq__(self, other):
        return self.key == other.key
    def __hash__(self):
        return hash(self.key)

Now, you want to put these in a set, but keyed by notkey instead of key. You can't do that as-is, because a set expects its elements to have a consistent meaning for equality—and also a consistent meaning for hash such that a == b always implies hash(a) == hash(b). So, create a wrapper:

class WrappedThingy(object):
    def __init__(self, thingy):
        self.thingy = thingy
    def __eq__(self, other):
        return self.thingy.notkey == other.thingy.notkey
    def __hash__(self):
        return hash(self.thingy.notkey)

And you can put those in a set:

wts = set(WrappedThingy(thingy) for thingy in thingies)

For example, let's say you want to uniquify your thingies, keeping exactly one thingy (arbitrarily) for each notkey value. Just wrap them, stick the the wrappers in a set, then unwrap them and stick the unwrappees in a list:

wts = set(WrappedThingy(thingy) for thingy in thingies)
thingies = [wt.thingy for wt in wts]

This is part of a more general Python pattern called "DSU". This stands for "decorate-sort-undecorate", which is wildly inaccurate nowadays, since you almost never need it for sorting-related tasks in modern Python… but historically it made sense. Feel free to call it "decorate-process-undecorate" in hopes that it will catch on, but don't hope too hard.

The reason you don't need DSU for sorting nowadays is that most sorting functions all take key functions as arguments. In fact, even for uniquifying, the unique_everseen function in the itertools recipes takes a key.

But if you look at what it does under the covers, it's basically DSU:

for element in iterable:
    k = key(element)
    if k not in seen:
        seen.add(k)
        yield element

(The fact that it's a generator rather than a list-building function means it can "undecorate on the fly", which makes things a bit simpler. But otherwise, same idea.)

Ila answered 30/4, 2013 at 18:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.