What is the correct (or best) way to subclass the Python set class, adding a new instance variable?
Asked Answered
C

9

18

I'm implementing an object that is almost identical to a set, but requires an extra instance variable, so I am subclassing the built-in set object. What is the best way to make sure that the value of this variable is copied when one of my objects is copied?

Using the old sets module, the following code worked perfectly:

import sets
class Fooset(sets.Set):
    def __init__(self, s = []):
        sets.Set.__init__(self, s)
        if isinstance(s, Fooset):
            self.foo = s.foo
        else:
            self.foo = 'default'
f = Fooset([1,2,4])
f.foo = 'bar'
assert( (f | f).foo == 'bar')

but this does not work using the built-in set module.

The only solution that I can see is to override every single method that returns a copied set object... in which case I might as well not bother subclassing the set object. Surely there is a standard way to do this?

(To clarify, the following code does not work (the assertion fails):

class Fooset(set):
    def __init__(self, s = []):
        set.__init__(self, s)
        if isinstance(s, Fooset):
            self.foo = s.foo
        else:
            self.foo = 'default'

f = Fooset([1,2,4])
f.foo = 'bar'
assert( (f | f).foo == 'bar')

)

Coffer answered 28/4, 2009 at 15:5 Comment(0)
E
21

My favorite way to wrap methods of a built-in collection:

class Fooset(set):
    def __init__(self, s=(), foo=None):
        super(Fooset,self).__init__(s)
        if foo is None and hasattr(s, 'foo'):
            foo = s.foo
        self.foo = foo



    @classmethod
    def _wrap_methods(cls, names):
        def wrap_method_closure(name):
            def inner(self, *args):
                result = getattr(super(cls, self), name)(*args)
                if isinstance(result, set) and not hasattr(result, 'foo'):
                    result = cls(result, foo=self.foo)
                return result
            inner.fn_name = name
            setattr(cls, name, inner)
        for name in names:
            wrap_method_closure(name)

Fooset._wrap_methods(['__ror__', 'difference_update', '__isub__', 
    'symmetric_difference', '__rsub__', '__and__', '__rand__', 'intersection',
    'difference', '__iand__', 'union', '__ixor__', 
    'symmetric_difference_update', '__or__', 'copy', '__rxor__',
    'intersection_update', '__xor__', '__ior__', '__sub__',
])

Essentially the same thing you're doing in your own answer, but with fewer loc. It's also easy to put in a metaclass if you want to do the same thing with lists and dicts as well.

Eure answered 30/4, 2009 at 1:1 Comment(1)
that's a useful contribution, thanks. it doesn't look like you're gaining much by making _wrap_methods a class method rather than a function - is that purely for the modularity it gives?Coffer
W
11

I think that the recommended way to do this is not to subclass directly from the built-in set, but rather to make use of the Abstract Base Class Set available in collections.abc.

Using the ABC Set gives you some methods for free as a mix-in so you can have a minimal Set class by defining only __contains__(), __len__() and __iter__(). If you want some of the nicer set methods like intersection() and difference(), you probably do have to wrap them.

Here's my attempt (this one happens to be a frozenset-like, but you can inherit from MutableSet to get a mutable version):

from collections.abc import Set, Hashable

class CustomSet(Set, Hashable):
    """An example of a custom frozenset-like object using
    Abstract Base Classes.
    """
    __hash__ = Set._hash

    wrapped_methods = ('difference',
                       'intersection',
                       'symetric_difference',
                       'union',
                       'copy')

    def __repr__(self):
        return "CustomSet({0})".format(list(self._set))

    def __new__(cls, iterable=None):
        selfobj = super(CustomSet, cls).__new__(CustomSet)
        selfobj._set = frozenset() if iterable is None else frozenset(iterable)
        for method_name in cls.wrapped_methods:
            setattr(selfobj, method_name, cls._wrap_method(method_name, selfobj))
        return selfobj

    @classmethod
    def _wrap_method(cls, method_name, obj):
        def method(*args, **kwargs):
            result = getattr(obj._set, method_name)(*args, **kwargs)
            return CustomSet(result)
        return method

    def __getattr__(self, attr):
        """Make sure that we get things like issuperset() that aren't provided
        by the mix-in, but don't need to return a new set."""
        return getattr(self._set, attr)

    def __contains__(self, item):
        return item in self._set

    def __len__(self):
        return len(self._set)

    def __iter__(self):
        return iter(self._set)
Woden answered 14/7, 2011 at 19:15 Comment(4)
Note that isinstance(CustomSet(), set) == False doesn't work. For reasons unknown to me, it is impossible to make it work, unless you inherit from set.Dropsical
Historically, Python has preferred "duck typing" and tends to discourage explicit isinstance() checks. Your point is valid, but I suspect in real life you'll rarely encounter a situation where an object with a set-like interface is insufficient and you need a "real" set instance.Woden
Does ___hash__ have an extra _?Mosaic
@Mr_and_Mrs_D, good eye! That was a typo: the magic method is __hash__Woden
D
4

Sadly, set does not follow the rules and __new__ is not called to make new set objects, even though they keep the type. This is clearly a bug in Python (issue #1721812, which will not be fixed in the 2.x sequence). You should never be able to get an object of type X without calling the type object that creates X objects! If set.__or__ is not going to call __new__ it is formally obligated to return set objects instead of subclass objects.

But actually, noting the post by nosklo above, your original behavior does not make any sense. The Set.__or__ operator should not be reusing either of the source objects to construct its result, it should be whipping up a new one, in which case its foo should be "default"!

So, practically, anyone doing this should have to overload those operators so that they would know which copy of foo gets used. If it is not dependent on the Foosets being combined, you can make it a class default, in which case it will get honored, because the new object thinks it is of the subclass type.

What I mean is, your example would work, sort of, if you did this:

class Fooset(set):
  foo = 'default'
  def __init__(self, s = []):
    if isinstance(s, Fooset):
      self.foo = s.foo

f = Fooset([1,2,5])
assert (f|f).foo == 'default'
Dolhenty answered 7/9, 2012 at 13:53 Comment(0)
C
2

set1 | set2 is an operation that won't modify either existing set, but return a new set instead. The new set is created and returned. There is no way to make it automatically copy arbritary attributes from one or both of the sets to the newly created set, without customizing the | operator yourself by defining the __or__ method.

class MySet(set):
    def __init__(self, *args, **kwds):
        super(MySet, self).__init__(*args, **kwds)
        self.foo = 'nothing'
    def __or__(self, other):
        result = super(MySet, self).__or__(other)
        result.foo = self.foo + "|" + other.foo
        return result

r = MySet('abc')
r.foo = 'bar'
s = MySet('cde')
s.foo = 'baz'

t = r | s

print r, s, t
print r.foo, s.foo, t.foo

Prints:

MySet(['a', 'c', 'b']) MySet(['c', 'e', 'd']) MySet(['a', 'c', 'b', 'e', 'd'])
bar baz bar|baz
Cytochrome answered 28/4, 2009 at 15:29 Comment(1)
This is what i suspected. In this case, I'll have to override and, or, rand, ror, rsub, rxor, sub, xor, add, copy, difference, intersection, symmetric_difference, and union. Have I missed any? To be honest, I was looking for something with the simple generality of the 2.5 solution I listed above... but a negative answer is good too. It does seem a little like a bug to me.Coffer
T
2

It looks like set bypasses __init__ in the c code. However you will end an instance of Fooset, it just won't have had a chance to copy the field.

Apart from overriding the methods that return new sets I'm not sure you can do too much in this case. Set is clearly built for a certain amount of speed, so does a lot of work in c.

Thankful answered 28/4, 2009 at 15:59 Comment(1)
sigh. thanks. that was my reading of the C code too, but i'm a python newbie so thought it was worth asking. i'd forgotten my reasons for disliking subclassing in general - the "external" subclass becomes dependent on the unpublished internal implementation details of its superclass.Coffer
F
2

I am trying to answer the questions reading it as: "How can I make the return values of the operators of "set" to be of the type of my subclass of set. Ignoring the details of the given class and whether or not the example is broken to begin with. I came here from my own question which would be a duplicate, if my reading is correct.

This answer differs from some of the other answers as follows:

  • The given class (subclass) gets changed only by adding a decorator
  • therefore is general enough to not care about details of the given class (hasattr(s, 'foo'))
  • The additional cost is paid once per class (when it's decorated), not for every instance.
  • The only matter of the given example, that's specific to the "set" is the list of methods, which can be defined easily.
  • Assumes, that the base class is NOT abstract and can be copy constructed itself (otherwise an __init__method needs to be implemented, that copies from an instance of the base class)

The library code, which can be put anywhere in the project or a module:

class Wrapfuncs:
  def __init__(self, *funcs):
    self._funcs = funcs

  def __call__(self, cls):
    def _wrap_method(method_name):
      def method(*args, **kwargs):
          result = getattr(cls.__base__, method_name)(*args, **kwargs)
          return cls(result)
      return method

    for func in self._funcs:
      setattr(cls, func, _wrap_method(func))
    return cls

To use it with a set, we need the list of methods, that return a new instance:

returning_ops_funcs = ['difference', 'symmetric_difference', '__rsub__', '__or__', '__ior__', '__rxor__', '__iand__', '__ror__', '__xor__', '__sub__', 'intersection', 'union', '__ixor__', '__and__', '__isub__', 'copy']

and we can use it with our class:

@Wrapfuncs(*returning_ops_funcs)
class MySet(set):
  pass

I am sparing the details of what could be special about this class.

I have tested the code with the following lines:

s1 = MySet([1, 2, 3])
s2 = MySet([2, 3, 4])
s3 = MySet([3, 4, 5])

print(s1&s2)
print(s1.intersection(s2))
print(s1 and s2)
print(s1|s2)
print(s1.union(s2))
print(s1|s2|s3)
print(s1.union(s2, s3))
print(s1 or s2)
print(s1-s2)
print(s1.difference(s2))
print(s1^s2)
print(s1.symmetric_difference(s2))

print(s1 & set(s2))
print(set(s1) & s2)

print(s1.copy())

which print:

MySet({2, 3})
MySet({2, 3})
MySet({2, 3, 4})
MySet({1, 2, 3, 4})
MySet({1, 2, 3, 4})
MySet({1, 2, 3, 4, 5})
MySet({1, 2, 3, 4, 5})
MySet({1, 2, 3})
MySet({1})
MySet({1})
MySet({1, 4})
MySet({1, 4})
MySet({2, 3})
{2, 3}
MySet({1, 2, 3})

There is one case, in which the result is not optimal. This is, where the operator is used with an instance of the class as right hand operand and an instance of the builtin 'set' as first. I don't like this, but I believe this problem is common to all proposed solutions I have seen.

I have also thought of providing an example, where the collections.abc.Set is used. While it could be done like this:

from collections.abc import Set, Hashable
@Wrapfuncs(*returning_ops_funcs)
class MySet(set, Set):
  pass

I am not sure, whether it comes with the benefits, that @bjmc had in mind, or what the "some methods" are, that it gives you "for free". This solution is targeted at using a base class to do the work and return instances of the subclass. A solution, that uses a member object to do the work could probably be generated in a similar way.

Francescafrancesco answered 27/1, 2021 at 15:25 Comment(0)
V
1

Generalizing Matthew Marshall response using a decorator without adding runtime checking, which would impact performance:

>>>from functools import wraps
>>>
>>> def builtinSubclass( *methods ):
...    def decorator( cls ):
...       for m in methods:
...          if cls.__dict__.get(m):
...             continue
...          def closure():
...             wrapped = getattr(set, m)
...             @wraps( wrapped )
...             def wrapper(self, *args, **kwargs):
...                print(wrapped, m)
...                return cls( wrapped( self, *args, **kwargs ) )
...             return wrapper
...          print(f"replacing {m}")
...          setattr( cls, m, closure() )
...       return cls
...    return decorator
...
>>>
>>> setSubclass = builtinSubclass('__ror__', 'difference_update', '__isub__',
...    'symmetric_difference', '__rsub__', '__and__', '__rand__', 'intersection',
...    'difference', '__iand__', 'union', '__ixor__', 'symmetric_difference_update',
...    '__or__', 'copy', '__rxor__', 'intersection_update', '__xor__', '__ior__',
...    '__sub__')
>>>
>>> @setSubclass
... class S(set):
...    pass
...
replacing __and__
replacing __or__
replacing intersection
>>> type(S([1]) | S([2]))
intersection
<class '__main__.S'>

clean code:

from functools import wraps

def builtinSubclass( *methods ):
   def decorator( cls ):
      for m in methods:
         if cls.__dict__.get(m):
            continue
         def makeWrapper():
            wrapped = getattr(set, m)
            @wraps( wrapped )
            def wrapper(self, *args, **kwargs):
               return cls(wrapped(self, *args, **kwargs))
            return wrapper
         setattr( cls, m, makeWrapper() )
      return cls
   return decorator

setSubclass = builtinSubclass('__ror__', 'difference_update', '__isub__',
                              'symmetric_difference', '__rsub__', '__and__',
                              '__rand__', 'intersection', 'difference',
                              '__iand__', 'union', '__ixor__',
                              'symmetric_difference_update', '__or__', 'copy',
                              '__rxor__', 'intersection_update', '__xor__',
                              '__ior__', '__sub__')
Valeric answered 22/4, 2022 at 11:10 Comment(0)
C
0

Assuming the other answers are correct, and overriding all the methods is the only way to do this, here's my attempt at a moderately elegant way of doing this. If more instance variables are added, only one piece of code needs to change. Unfortunately if a new binary operator is added to the set object, this code will break, but I don't think there's a way to avoid that. Comments welcome!

def foocopy(f):
    def cf(self, new):
        r = f(self, new)
        r.foo = self.foo
        return r
    return cf

class Fooset(set):
    def __init__(self, s = []):
        set.__init__(self, s)
        if isinstance(s, Fooset):
            self.foo = s.foo
        else:
            self.foo = 'default'

    def copy(self):
        x = set.copy(self)
        x.foo = self.foo
        return x

    @foocopy
    def __and__(self, x):
        return set.__and__(self, x)

    @foocopy
    def __or__(self, x):
        return set.__or__(self, x)

    @foocopy
    def __rand__(self, x):
        return set.__rand__(self, x)

    @foocopy
    def __ror__(self, x):
        return set.__ror__(self, x)

    @foocopy
    def __rsub__(self, x):
        return set.__rsub__(self, x)

    @foocopy
    def __rxor__(self, x):
        return set.__rxor__(self, x)

    @foocopy
    def __sub__(self, x):
        return set.__sub__(self, x)

    @foocopy
    def __xor__(self, x):
        return set.__xor__(self, x)

    @foocopy
    def difference(self, x):
        return set.difference(self, x)

    @foocopy
    def intersection(self, x):
        return set.intersection(self, x)

    @foocopy
    def symmetric_difference(self, x):
        return set.symmetric_difference(self, x)

    @foocopy
    def union(self, x):
        return set.union(self, x)


f = Fooset([1,2,4])
f.foo = 'bar'
assert( (f | f).foo == 'bar')
Coffer answered 28/4, 2009 at 16:31 Comment(2)
You've got some infinite recursion in the copy method. x = self.copy() should be x = super(Fooset,self).copy()Thankful
yes, you're right. is using super() better than explicitly mentioning the superclass?Coffer
K
-2

For me this works perfectly using Python 2.5.2 on Win32. Using you class definition and the following test:

f = Fooset([1,2,4])
s = sets.Set((5,6,7))
print f, f.foo
f.foo = 'bar'
print f, f.foo
g = f | s
print g, g.foo
assert( (f | f).foo == 'bar')

I get this output, which is what I expect:

Fooset([1, 2, 4]) default
Fooset([1, 2, 4]) bar
Fooset([1, 2, 4, 5, 6, 7]) bar
Kiloton answered 28/4, 2009 at 15:39 Comment(2)
yes, this works with 2.5.2, but can you make it work with the built-in set type in python 2.6?Coffer
since you had import sets in your code, and did not mention 2.6, I assumed you'd be using the sets.py module. If this is no longer available in 2.6 your likely out of luckKiloton

© 2022 - 2024 — McMap. All rights reserved.