How to properly subclass dict and override __getitem__ & __setitem__
Asked Answered
E

6

104

I am debugging some code and I want to find out when a particular dictionary is accessed. Well, it's actually a class that subclasses dict and implements a couple extra features. Anyway, what I would like to do is subclass dict myself and override __getitem__ and __setitem__ to produce some debugging output. Right now, I have

class DictWatch(dict):
    def __init__(self, *args):
        dict.__init__(self, args)

    def __getitem__(self, key):
        val = dict.__getitem__(self, key)
        log.info("GET %s['%s'] = %s" % str(dict.get(self, 'name_label')), str(key), str(val)))
        return val

    def __setitem__(self, key, val):
        log.info("SET %s['%s'] = %s" % str(dict.get(self, 'name_label')), str(key), str(val)))
        dict.__setitem__(self, key, val)

'name_label' is a key which will eventually be set that I want to use to identify the output. I have then changed the class I am instrumenting to subclass DictWatch instead of dict and changed the call to the superconstructor. Still, nothing seems to be happening. I thought I was being clever, but I wonder if I should be going a different direction.

Enmity answered 6/3, 2010 at 0:24 Comment(4)
Did you try to use print instead of log? Also, could you explain how do you create/configure you log?Technical
Doesn't dict.__init__ take *args?Contour
Looks a bit like a good candidate for a decorator.Contour
realpython.com/inherit-python-dictKwarteng
B
45

What you're doing should absolutely work. I tested out your class, and aside from a missing opening parenthesis in your log statements, it works just fine. There are only two things I can think of. First, is the output of your log statement set correctly? You might need to put a logging.basicConfig(level=logging.DEBUG) at the top of your script.

Second, __getitem__ and __setitem__ are only called during [] accesses. So make sure you only access DictWatch via d[key], rather than d.get() and d.set()

Batrachian answered 6/3, 2010 at 0:42 Comment(4)
Actually it's not extra parens, but a missing opening paren around (str(dict.get(self, 'name_label')), str(key), str(val)))Cistercian
True. To the OP: For future reference, you can simply do log.info('%s %s %s', a, b, c), instead of a Python string formatting operator.Batrachian
Logging level ended up being the issue. I'm debugging someone else's code and I was originally testing in another file which head a different level of debugging set. Thanks!Enmity
What is dict.set? It doesn't exist. dict don't have a set attribute.Kwarteng
S
89

Another issue when subclassing dict is that the built-in __init__ doesn't call update, and the built-in update doesn't call __setitem__. So, if you want all setitem operations to go through your __setitem__ function, you should make sure that it gets called yourself:

class DictWatch(dict):
    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)

    def __getitem__(self, key):
        val = dict.__getitem__(self, key)
        print('GET', key)
        return val

    def __setitem__(self, key, val):
        print('SET', key, val)
        dict.__setitem__(self, key, val)

    def __repr__(self):
        dictrepr = dict.__repr__(self)
        return '%s(%s)' % (type(self).__name__, dictrepr)
        
    def update(self, *args, **kwargs):
        print('update', args, kwargs)
        for k, v in dict(*args, **kwargs).items():
            self[k] = v
Studio answered 6/3, 2010 at 1:27 Comment(8)
I have tried your sol, but it seems that it only works for only one level of indexing (i.e., dict[key] and not dict[key1][key2] ... )*Huerta
d[key1] returns something, perhaps a dictionary. The second key indexes that. This technique can’t work unless that returned thing supports the watch behavior also.Studio
@AndrewNaguib: Why should it work with nested arrays? Nested array do not work with normal python dict either (if you did not implement it yourself)Chrissa
Yes I did not know so :), for nested indexing level DictWatch(val) should be returned instead.Huerta
@AndrewNaguib: __getitem__ would need to test val and only do that conditionally — i.e. if isinstance(val, dict): ...Cheops
Having to overwrite 5 methods for a simple case feel overcomplicated. This is why collections.UserDict exists. UserDict only require to overwrite __setitem__ to be compatible with __init__, setdefault, update,...Morphosis
Subclassing MutableMapping or UserDict is preferred over subclassing dict in most cases. However UserDict does not subclass dict so if you need the real builtin python dict as your parent class, this does not help you. @MorphosisStudio
Does the update method take any more argument than a positional argument for the other dictionary that is used to update the first dictionary?Mauro
B
45

What you're doing should absolutely work. I tested out your class, and aside from a missing opening parenthesis in your log statements, it works just fine. There are only two things I can think of. First, is the output of your log statement set correctly? You might need to put a logging.basicConfig(level=logging.DEBUG) at the top of your script.

Second, __getitem__ and __setitem__ are only called during [] accesses. So make sure you only access DictWatch via d[key], rather than d.get() and d.set()

Batrachian answered 6/3, 2010 at 0:42 Comment(4)
Actually it's not extra parens, but a missing opening paren around (str(dict.get(self, 'name_label')), str(key), str(val)))Cistercian
True. To the OP: For future reference, you can simply do log.info('%s %s %s', a, b, c), instead of a Python string formatting operator.Batrachian
Logging level ended up being the issue. I'm debugging someone else's code and I was originally testing in another file which head a different level of debugging set. Thanks!Enmity
What is dict.set? It doesn't exist. dict don't have a set attribute.Kwarteng
B
27

Consider subclassing UserDict or UserList. These classes are intended to be subclassed whereas the normal dict and list are not, and contain optimisations.

Bish answered 26/3, 2018 at 19:21 Comment(4)
For reference, the documentation in Python 3.6 says "The need for this class has been partially supplanted by the ability to subclass directly from dict; however, this class can be easier to work with because the underlying dictionary is accessible as an attribute".Sludge
@andrew an example might be helpful.Skiba
@VasanthaGaneshK treyhunner.com/2019/04/…Aeolus
Another reason to use UserDict: It makes copy() behave correctly.Lassitude
M
10

As Andrew Pate's answer proposed, subclassing collections.UserDict instead of dict is much less error prone.

Here is an example showing an issue when inheriting dict naively:

class MyDict(dict):

  def __setitem__(self, key, value):
    super().__setitem__(key, value * 10)


d = MyDict(a=1, b=2)  # Bad! MyDict.__setitem__ not called
d.update(c=3)  # Bad! MyDict.__setitem__ not called
d['d'] = 4  # Good!
print(d)  # {'a': 1, 'b': 2, 'c': 3, 'd': 40}

UserDict inherits from collections.abc.MutableMapping, so this works as expected:

class MyDict(collections.UserDict):

  def __setitem__(self, key, value):
    super().__setitem__(key, value * 10)


d = MyDict(a=1, b=2)  # Good: MyDict.__setitem__ correctly called
d.update(c=3)  # Good: MyDict.__setitem__ correctly called
d['d'] = 4  # Good
print(d)  # {'a': 10, 'b': 20, 'c': 30, 'd': 40}

Similarly, you only have to implement __getitem__ to automatically be compatible with key in my_dict, my_dict.get, …

Note: UserDict is not a subclass of dict, so isinstance(UserDict(), dict) will fail (but isinstance(UserDict(), collections.abc.MutableMapping) will work).

Morphosis answered 2/11, 2020 at 13:32 Comment(0)
S
9

That should not really change the result (which should work, for good logging threshold values) : your init should be :

def __init__(self,*args,**kwargs) : dict.__init__(self,*args,**kwargs) 

instead, because if you call your method with DictWatch([(1,2),(2,3)]) or DictWatch(a=1,b=2) this will fail.

(or,better, don't define a constructor for this)

Stockmon answered 6/3, 2010 at 0:48 Comment(1)
I'm only worried about the dict[key] form of access, so this isn't an issue.Enmity
L
1

All you will have to do is

class BatchCollection(dict):
    def __init__(self, inpt={}):
        super(BatchCollection, self).__init__(inpt)

A sample usage for my personal use

### EXAMPLE
class BatchCollection(dict):
    def __init__(self, inpt={}):
        super(BatchCollection, self).__init__(inpt)

    def __setitem__(self, key, item):
        if (isinstance(key, tuple) and len(key) == 2
                and isinstance(item, collections.Iterable)):
            # self.__dict__[key] = item
            super(BatchCollection, self).__setitem__(key, item)
        else:
            raise Exception(
                "Valid key should be a tuple (database_name, table_name) "
                "and value should be iterable")

Note: tested only in python3

Lashoh answered 6/10, 2017 at 8:3 Comment(1)
Since this is Python 3, I recommend just using super() instead of super(BatchCollection, self)Sheol

© 2022 - 2024 — McMap. All rights reserved.