How to "perfectly" override a dict?
Asked Answered
S

6

271

How can I make as "perfect" a subclass of dict as possible? The end goal is to have a simple dict in which the keys are lowercase.

It would seem that there should be some tiny set of primitives I can override to make this work, but according to all my research and attempts it seem like this isn't the case:

Here is my first go at it, get() doesn't work and no doubt there are many other minor problems:

class arbitrary_dict(dict):
    """A dictionary that applies an arbitrary key-altering function
       before accessing the keys."""

    def __keytransform__(self, key):
        return key

    # Overridden methods. List from 
    # https://mcmap.net/q/103786/-how-to-properly-subclass-dict-and-override-__getitem__-amp-__setitem__

    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)

    # Note: I'm using dict directly, since super(dict, self) doesn't work.
    # I'm not sure why, perhaps dict is not a new-style class.

    def __getitem__(self, key):
        return dict.__getitem__(self, self.__keytransform__(key))

    def __setitem__(self, key, value):
        return dict.__setitem__(self, self.__keytransform__(key), value)

    def __delitem__(self, key):
        return dict.__delitem__(self, self.__keytransform__(key))

    def __contains__(self, key):
        return dict.__contains__(self, self.__keytransform__(key))


class lcdict(arbitrary_dict):
    def __keytransform__(self, key):
        return str(key).lower()
Stephanus answered 2/8, 2010 at 12:23 Comment(2)
I think __keytransform__() should be static. Nice approach though. (prepending @staticmethod)Waring
related: https://mcmap.net/q/103787/-advantages-of-userdict-classHedwig
T
285

You can write an object that behaves like a dict quite easily with ABCs (Abstract Base Classes) from the collections.abc module. It even tells you if you missed a method, so below is the minimal version that shuts the ABC up.

from collections.abc import MutableMapping


class TransformedDict(MutableMapping):
    """A dictionary that applies an arbitrary key-altering
       function before accessing the keys"""

    def __init__(self, *args, **kwargs):
        self.store = dict()
        self.update(dict(*args, **kwargs))  # use the free update to set keys

    def __getitem__(self, key):
        return self.store[self._keytransform(key)]

    def __setitem__(self, key, value):
        self.store[self._keytransform(key)] = value

    def __delitem__(self, key):
        del self.store[self._keytransform(key)]

    def __iter__(self):
        return iter(self.store)
    
    def __len__(self):
        return len(self.store)

    def _keytransform(self, key):
        return key

You get a few free methods from the ABC:

class MyTransformedDict(TransformedDict):

    def _keytransform(self, key):
        return key.lower()


s = MyTransformedDict([('Test', 'test')])

assert s.get('TEST') is s['test']   # free get
assert 'TeSt' in s                  # free __contains__
                                    # free setdefault, __eq__, and so on

import pickle
# works too since we just use a normal dict
assert pickle.loads(pickle.dumps(s)) == s

I wouldn't subclass dict (or other builtins) directly. It often makes no sense, because what you actually want to do is implement the interface of a dict. And that is exactly what ABCs are for.

Tucket answered 2/8, 2010 at 13:0 Comment(14)
Question though--won't implementing this interface with a user-defined type generally result in slower dict-like operations that using the built-in type?Muzzleloader
@Muzzleloader yes, but that probably doesn't matter to someone using Python.Layfield
Is there a way to do this so that isinstance(_, dict) == True ? Or do you just use Mutable Mapping to construct then subclass?Clarinda
@NeilG Then what's the gain on this approach, other than 20 extra lines, over MyClass = type('MyClass', (dict,), {})?Muzzleloader
@AndyHayden: You should write if isinstance(t, collections.MutableMapping): print t, "can be used like a dict". Don't check the type of a object, check the interface.Tucket
@NeilG This unfortunately includes the JSONEncoder in the python standard library - github.com/python-git/python/blob/…Sanderling
Is it possible to implement TransformedDict like this with __getattr__ and __setattr__ aliases for get/setting self.store's contents? (i.e. obj.key = val behaves just like obj['key'] = val.) I keep running into either RecursionErrors or AttributeErrors.Orourke
(Please see implementing-a-dict-like-object-with-getattr-and-setattr-functionality where I answer my own question.)Orourke
Why use this __keytransform__ method when you could just use the arg, key directly?Dwelling
@JamesT., the keytransform method is meant to be overloaded by a derived class, as in the "MyTransformedDict" example.Coastland
Jochen: Your advice to @AndyHayden is fine for your own code, but that cannot always be done when it's not. For example, the json module explicitly checks isinstance(obj, dict).Tiebold
Why not just use self.store = dict(*args, **kwargs) in the initializer?Ehrlich
The unpacking operator * in dict(*args, **kwargs) is wrong. dict()'s constructor only accepts 1 positional arg. It accepts multiple kwargs though. As a result __init__() of TransformedDict is wrong too. It should be __init__(self, seq=None, **kwargs).Gentlemanly
Since this question is 13yrs old, is this still the preferred solution in 2023 and Python 3.12 era, or is there a better way?Jellyfish
B
137

How can I make as "perfect" a subclass of dict as possible?

The end goal is to have a simple dict in which the keys are lowercase.

  • If I override __getitem__/__setitem__, then get/set don't work. How do I make them work? Surely I don't need to implement them individually?

  • Am I preventing pickling from working, and do I need to implement __setstate__ etc?

  • Do I need repr, update and __init__?

  • Should I just use mutablemapping (it seems one shouldn't use UserDict or DictMixin)? If so, how? The docs aren't exactly enlightening.

The accepted answer would be my first approach, but since it has some issues, and since no one has addressed the alternative, actually subclassing a dict, I'm going to do that here.

What's wrong with the accepted answer?

This seems like a rather simple request to me:

How can I make as "perfect" a subclass of dict as possible? The end goal is to have a simple dict in which the keys are lowercase.

The accepted answer doesn't actually subclass dict, and a test for this fails:

>>> isinstance(MyTransformedDict([('Test', 'test')]), dict)
False

Ideally, any type-checking code would be testing for the interface we expect, or an abstract base class, but if our data objects are being passed into functions that are testing for dict - and we can't "fix" those functions, this code will fail.

Other quibbles one might make:

  • The accepted answer is also missing the classmethod: fromkeys.
  • The accepted answer also has a redundant __dict__ - therefore taking up more space in memory:

    >>> s.foo = 'bar'
    >>> s.__dict__
    {'foo': 'bar', 'store': {'test': 'test'}}
    

Actually subclassing dict

We can reuse the dict methods through inheritance. All we need to do is create an interface layer that ensures keys are passed into the dict in lowercase form if they are strings.

If I override __getitem__/__setitem__, then get/set don't work. How do I make them work? Surely I don't need to implement them individually?

Well, implementing them each individually is the downside to this approach and the upside to using MutableMapping (see the accepted answer), but it's really not that much more work.

First, let's factor out the difference between Python 2 and 3, create a singleton (_RaiseKeyError) to make sure we know if we actually get an argument to dict.pop, and create a function to ensure our string keys are lowercase:

from itertools import chain
try:              # Python 2
    str_base = basestring
    items = 'iteritems'
except NameError: # Python 3
    str_base = str, bytes, bytearray
    items = 'items'

_RaiseKeyError = object() # singleton for no-default behavior

def ensure_lower(maybe_str):
    """dict keys can be any hashable object - only call lower if str"""
    return maybe_str.lower() if isinstance(maybe_str, str_base) else maybe_str

Now we implement - I'm using super with the full arguments so that this code works for Python 2 and 3:

class LowerDict(dict):  # dicts take a mapping or iterable as their optional first argument
    __slots__ = () # no __dict__ - that would be redundant
    @staticmethod # because this doesn't make sense as a global function.
    def _process_args(mapping=(), **kwargs):
        if hasattr(mapping, items):
            mapping = getattr(mapping, items)()
        return ((ensure_lower(k), v) for k, v in chain(mapping, getattr(kwargs, items)()))
    def __init__(self, mapping=(), **kwargs):
        super(LowerDict, self).__init__(self._process_args(mapping, **kwargs))
    def __getitem__(self, k):
        return super(LowerDict, self).__getitem__(ensure_lower(k))
    def __setitem__(self, k, v):
        return super(LowerDict, self).__setitem__(ensure_lower(k), v)
    def __delitem__(self, k):
        return super(LowerDict, self).__delitem__(ensure_lower(k))
    def get(self, k, default=None):
        return super(LowerDict, self).get(ensure_lower(k), default)
    def setdefault(self, k, default=None):
        return super(LowerDict, self).setdefault(ensure_lower(k), default)
    def pop(self, k, v=_RaiseKeyError):
        if v is _RaiseKeyError:
            return super(LowerDict, self).pop(ensure_lower(k))
        return super(LowerDict, self).pop(ensure_lower(k), v)
    def update(self, mapping=(), **kwargs):
        super(LowerDict, self).update(self._process_args(mapping, **kwargs))
    def __contains__(self, k):
        return super(LowerDict, self).__contains__(ensure_lower(k))
    def copy(self): # don't delegate w/ super - dict.copy() -> dict :(
        return type(self)(self)
    @classmethod
    def fromkeys(cls, keys, v=None):
        return super(LowerDict, cls).fromkeys((ensure_lower(k) for k in keys), v)
    def __repr__(self):
        return '{0}({1})'.format(type(self).__name__, super(LowerDict, self).__repr__())

We use an almost boiler-plate approach for any method or special method that references a key, but otherwise, by inheritance, we get methods: len, clear, items, keys, popitem, and values for free. While this required some careful thought to get right, it is trivial to see that this works.

(Note that haskey was deprecated in Python 2, removed in Python 3.)

Here's some usage:

>>> ld = LowerDict(dict(foo='bar'))
>>> ld['FOO']
'bar'
>>> ld['foo']
'bar'
>>> ld.pop('FoO')
'bar'
>>> ld.setdefault('Foo')
>>> ld
{'foo': None}
>>> ld.get('Bar')
>>> ld.setdefault('Bar')
>>> ld
{'bar': None, 'foo': None}
>>> ld.popitem()
('bar', None)

Am I preventing pickling from working, and do I need to implement __setstate__ etc?

pickling

And the dict subclass pickles just fine:

>>> import pickle
>>> pickle.dumps(ld)
b'\x80\x03c__main__\nLowerDict\nq\x00)\x81q\x01X\x03\x00\x00\x00fooq\x02Ns.'
>>> pickle.loads(pickle.dumps(ld))
{'foo': None}
>>> type(pickle.loads(pickle.dumps(ld)))
<class '__main__.LowerDict'>

__repr__

Do I need repr, update and __init__?

We defined update and __init__, but you have a beautiful __repr__ by default:

>>> ld # without __repr__ defined for the class, we get this
{'foo': None}

However, it's good to write a __repr__ to improve the debugability of your code. The ideal test is eval(repr(obj)) == obj. If it's easy to do for your code, I strongly recommend it:

>>> ld = LowerDict({})
>>> eval(repr(ld)) == ld
True
>>> ld = LowerDict(dict(a=1, b=2, c=3))
>>> eval(repr(ld)) == ld
True

You see, it's exactly what we need to recreate an equivalent object - this is something that might show up in our logs or in backtraces:

>>> ld
LowerDict({'a': 1, 'c': 3, 'b': 2})

Conclusion

Should I just use mutablemapping (it seems one shouldn't use UserDict or DictMixin)? If so, how? The docs aren't exactly enlightening.

Yeah, these are a few more lines of code, but they're intended to be comprehensive. My first inclination would be to use the accepted answer, and if there were issues with it, I'd then look at my answer - as it's a little more complicated, and there's no ABC to help me get my interface right.

Premature optimization is going for greater complexity in search of performance. MutableMapping is simpler - so it gets an immediate edge, all else being equal. Nevertheless, to lay out all the differences, let's compare and contrast.

I should add that there was a push to put a similar dictionary into the collections module, but it was rejected. You should probably just do this instead:

my_dict[transform(key)]

It should be far more easily debugable.

Compare and contrast

There are 6 interface functions implemented with the MutableMapping (which is missing fromkeys) and 11 with the dict subclass. I don't need to implement __iter__ or __len__, but instead I have to implement get, setdefault, pop, update, copy, __contains__, and fromkeys - but these are fairly trivial, since I can use inheritance for most of those implementations.

The MutableMapping implements some things in Python that dict implements in C - so I would expect a dict subclass to be more performant in some cases.

We get a free __eq__ in both approaches - both of which assume equality only if another dict is all lowercase - but again, I think the dict subclass will compare more quickly.

Summary:

  • subclassing MutableMapping is simpler with fewer opportunities for bugs, but slower, takes more memory (see redundant dict), and fails isinstance(x, dict)
  • subclassing dict is faster, uses less memory, and passes isinstance(x, dict), but it has greater complexity to implement.

Which is more perfect? That depends on your definition of perfect.

Botany answered 7/9, 2016 at 17:9 Comment(14)
How would the accepted answer remove the redundant dict?Yogi
Two ways that immediately come to mind are to either declare the store attribute in __slots__ or perhaps reuse the __dict__ as the store, but that mixes semantics, another potential point of criticism.Botany
Would'nt it have been easier to write a decorator that takes a method and uses your ensure_lower on the first arguemtn (which is always the key)? Then it would be the same number of overrides, but they would all be of the form __getitem__ = ensure_lower_decorator(super(LowerDict, self).__getitem__).Evered
@Evered that won't work - self, passed to super, is an argument to the method. You could possibly do it by hard-coding the parent's __getitem__ but since that's less correct, I don't think it's "easier". Although if you think I'm wrong, feel free to write a better answer.Botany
Thanks for this - getting warnings for pop and fromkeys that they do not match the signature of base class method.Secluded
@Secluded Thanks for the feedback - I gave it a little more finessing. Let me know how it works for you now, I want it to be as complete as possible.Botany
Thanks a lot - will try and use it and get back with more feedback (don't hold your breath :P) - for now I think that to shutdown the warnings (PyCharm) fromKeys should be a class method@staticmethod def fromkeys(keys, v=None): and call return super(LowerDict, LowerDict).fromkeys((ensure_lower(k) for k in keys), v) - btw Pycharm warns for passing a generator in fromkeys((ensure_lower(k)... but that can be safely ignored I guessSecluded
And another gotcha - copy - ld = LowerDict(dict(foo='bar')) ; ld = ld.copy(); ld['FOO'] -> KeyErrorSecluded
And a final point to consider: would it be worth it to replace the if hasattr(mapping, items): with a isinstance(mpping, collections.abc.Mapping) ? Which are the pros and cons ? Anyway, thanks again :)Secluded
@Secluded I added an implementation of copy - I think that should do it, no? I think it should test for the interface - e.g. the pandas DataFrame object is not a Mapping instance (at last check) but it does have items/iteritems.Botany
I would strongly recommend against subclassing dict, at least in Python 2.7. Take your example, and **an_instance it - you'll see that it doesn't call any of your methods, and this is a fairly common thing to do in Python. In your specific example this doesn't matter, since it's storing already-lowered keys in the backing dict (which is what ** will access), but in general this isn't necessarily true.Uni
@Uni give us some code that demonstrates your objectionsBotany
Just did! https://mcmap.net/q/103553/-how-to-quot-perfectly-quot-override-a-dict and repl.it/repls/TraumaticToughCockatooUni
I wonder if this code can be abstracted so you pass a MutableMapping and it returns a subclass of dict with the MutableMapping used for methods.Clarinda
U
7

After trying out both of the top two suggestions, I've settled on a shady-looking middle route for Python 2.7. Maybe 3 is saner, but for me:

class MyDict(MutableMapping):
   # ... the few __methods__ that mutablemapping requires
   # and then this monstrosity
   @property
   def __class__(self):
       return dict

which I really hate, but seems to fit my needs, which are:

  • can override **my_dict
    • if you inherit from dict, this bypasses your code. try it out.
    • this makes #2 unacceptable for me at all times, as this is quite common in python code
  • masquerades as isinstance(my_dict, dict)
    • rules out MutableMapping alone, so #1 is not enough
    • I heartily recommend #1 if you don't need this, it's simple and predictable
  • fully controllable behavior
    • so I cannot inherit from dict

If you need to tell yourself apart from others, personally I use something like this (though I'd recommend better names):

def __am_i_me(self):
  return True

@classmethod
def __is_it_me(cls, other):
  try:
    return other.__am_i_me()
  except Exception:
    return False

As long as you only need to recognize yourself internally, this way it's harder to accidentally call __am_i_me due to python's name-munging (this is renamed to _MyDict__am_i_me from anything calling outside this class). Slightly more private than _methods, both in practice and culturally.

So far I have no complaints, aside from the seriously-shady-looking __class__ override. I'd be thrilled to hear of any problems that others encounter with this though, I don't fully understand the consequences. But so far I've had no problems whatsoever, and this allowed me to migrate a lot of middling-quality code in lots of locations without needing any changes.


As evidence: https://repl.it/repls/TraumaticToughCockatoo

Basically: copy the current #2 option, add print 'method_name' lines to every method, and then try this and watch the output:

d = LowerDict()  # prints "init", or whatever your print statement said
print '------'
splatted = dict(**d)  # note that there are no prints here

You'll see similar behavior for other scenarios. Say your fake-dict is a wrapper around some other datatype, so there's no reasonable way to store the data in the backing-dict; **your_dict will be empty, regardless of what every other method does.

This works correctly for MutableMapping, but as soon as you inherit from dict it becomes uncontrollable.


Edit: as an update, this has been running without a single issue for almost two years now, on several hundred thousand (eh, might be a couple million) lines of complicated, legacy-ridden python. So I'm pretty happy with it :)

Edit 2: apparently I mis-copied this or something long ago. @classmethod __class__ does not work for isinstance checks - @property __class__ does: https://repl.it/repls/UnitedScientificSequence

Uni answered 18/11, 2017 at 1:49 Comment(10)
What exactly do you mean by "**your_dict will be empty" (if you subclass from dict)? I have not seen any problems with dict unpacking...Documentation
If you actually put data into the parent dict (like LowerDict does), it works - you'll get that dict-stored data. If you don't (say you wanted to generate data on the fly, like {access_count: "stack trace of access"} that fills in each time it's read), you'll notice that **your_dict doesn't execute your code, so it can't output anything "special". E.g. you can't count "reads" because it doesn't execute your read-counting code. MutableMapping does work for this (use it if you can!), but it fails isinstance(..., dict) so I couldn't use it. yay legacy software.Uni
Ok, I see what you mean now. I suppose I didn't expect code execution with **your_dict, but I find it very interesting that MutableMapping will do that.Documentation
Yea. It's necessary for a number of things (e.g. I was shimming RPC calls into what used to be a local-dict read, and had to do it on demand for Reasons™), and it seems very few people are aware of it, even tho **some_dict is fairly common. At the very least it happens very often in decorators, so if you have any, you're immediately at risk of seemingly-impossible misbehavior if you don't account for it.Uni
Perhaps I'm missing something, but the def __class__() trick doesn't seem to work with either Python 2 or 3, at least for the example code in the question How to register implementation of abc.MutableMapping as a dict subclass? (modified to otherwise work in the two versions). I want isinstance(SpreadSheet(), dict) to return True.Tiebold
Hmm, may have found a way to get it to work: If I also add a self.__class__ = dict to my class' __init__() method, then isinstance(_, dict) will return True. The def __class__() is still needed, because without it, the assignment raises TypeError: __class__ assignment only supported for heap types or ModuleType subclasses.Tiebold
@martineau: With the self.__class__ assignment in place, the only role of your __class__ classmethod is to hide the normal __class__ descriptor. Anything that actually uses the classmethod will probably fail, because the classmethod makes no sense. __class__ isn't supposed to behave as any sort of method.Selfdevotion
In particular, this answer's use of a __class__ classmethod doesn't make isinstance checks work like Groxx thinks, so either that giant legacy codebase isn't really relying on those isinstance checks passing, or something else is going on.Selfdevotion
I must've made a wrong copy/paste somewhere. Yep, @classmethod __class__ doesn't work. @property __class__ does. (I have a test suite that ensures this does work, not sure why I got it wrong here). Evidence: repl.it/repls/UnitedScientificSequenceUni
In any case tho: thanks for being suspicious! It got me to look more closely and write some small tests, and then wonder wtf I did that worked previously (or if something changed between python 2.7.10 and 2.7.16). Yay old code that still runs.Uni
S
6

My requirements were a bit stricter:

  • I had to retain case info (the strings are paths to files displayed to the user, but it's a windows app so internally all operations must be case insensitive)
  • I needed keys to be as small as possible (it did make a difference in memory performance, chopped off 110 mb out of 370). This meant that caching lowercase version of keys is not an option.
  • I needed creation of the data structures to be as fast as possible (again made a difference in performance, speed this time). I had to go with a builtin

My initial thought was to substitute our clunky Path class for a case insensitive unicode subclass - but:

  • proved hard to get that right - see: A case insensitive string class in python
  • turns out that explicit dict keys handling makes code verbose and messy - and error prone (structures are passed hither and thither, and it is not clear if they have CIStr instances as keys/elements, easy to forget plus some_dict[CIstr(path)] is ugly)

So I had finally to write down that case insensitive dict. Thanks to code by @AaronHall that was made 10 times easier.

class CIstr(unicode):
    """See https://mcmap.net/q/103790/-a-case-insensitive-string-class-in-python, especially for inlines"""
    __slots__ = () # does make a difference in memory performance

    #--Hash/Compare
    def __hash__(self):
        return hash(self.lower())
    def __eq__(self, other):
        if isinstance(other, CIstr):
            return self.lower() == other.lower()
        return NotImplemented
    def __ne__(self, other):
        if isinstance(other, CIstr):
            return self.lower() != other.lower()
        return NotImplemented
    def __lt__(self, other):
        if isinstance(other, CIstr):
            return self.lower() < other.lower()
        return NotImplemented
    def __ge__(self, other):
        if isinstance(other, CIstr):
            return self.lower() >= other.lower()
        return NotImplemented
    def __gt__(self, other):
        if isinstance(other, CIstr):
            return self.lower() > other.lower()
        return NotImplemented
    def __le__(self, other):
        if isinstance(other, CIstr):
            return self.lower() <= other.lower()
        return NotImplemented
    #--repr
    def __repr__(self):
        return '{0}({1})'.format(type(self).__name__,
                                 super(CIstr, self).__repr__())

def _ci_str(maybe_str):
    """dict keys can be any hashable object - only call CIstr if str"""
    return CIstr(maybe_str) if isinstance(maybe_str, basestring) else maybe_str

class LowerDict(dict):
    """Dictionary that transforms its keys to CIstr instances.
    Adapted from: https://mcmap.net/q/103553/-how-to-quot-perfectly-quot-override-a-dict
    """
    __slots__ = () # no __dict__ - that would be redundant

    @staticmethod # because this doesn't make sense as a global function.
    def _process_args(mapping=(), **kwargs):
        if hasattr(mapping, 'iteritems'):
            mapping = getattr(mapping, 'iteritems')()
        return ((_ci_str(k), v) for k, v in
                chain(mapping, getattr(kwargs, 'iteritems')()))
    def __init__(self, mapping=(), **kwargs):
        # dicts take a mapping or iterable as their optional first argument
        super(LowerDict, self).__init__(self._process_args(mapping, **kwargs))
    def __getitem__(self, k):
        return super(LowerDict, self).__getitem__(_ci_str(k))
    def __setitem__(self, k, v):
        return super(LowerDict, self).__setitem__(_ci_str(k), v)
    def __delitem__(self, k):
        return super(LowerDict, self).__delitem__(_ci_str(k))
    def copy(self): # don't delegate w/ super - dict.copy() -> dict :(
        return type(self)(self)
    def get(self, k, default=None):
        return super(LowerDict, self).get(_ci_str(k), default)
    def setdefault(self, k, default=None):
        return super(LowerDict, self).setdefault(_ci_str(k), default)
    __no_default = object()
    def pop(self, k, v=__no_default):
        if v is LowerDict.__no_default:
            # super will raise KeyError if no default and key does not exist
            return super(LowerDict, self).pop(_ci_str(k))
        return super(LowerDict, self).pop(_ci_str(k), v)
    def update(self, mapping=(), **kwargs):
        super(LowerDict, self).update(self._process_args(mapping, **kwargs))
    def __contains__(self, k):
        return super(LowerDict, self).__contains__(_ci_str(k))
    @classmethod
    def fromkeys(cls, keys, v=None):
        return super(LowerDict, cls).fromkeys((_ci_str(k) for k in keys), v)
    def __repr__(self):
        return '{0}({1})'.format(type(self).__name__,
                                 super(LowerDict, self).__repr__())

Implicit vs explicit is still a problem, but once dust settles, renaming of attributes/variables to start with ci (and a big fat doc comment explaining that ci stands for case insensitive) I think is a perfect solution - as readers of the code must be fully aware that we are dealing with case insensitive underlying data structures. This will hopefully fix some hard to reproduce bugs, which I suspect boil down to case sensitivity.

Comments/corrections welcome :)

Secluded answered 17/4, 2017 at 18:34 Comment(7)
CIstr's __repr__ should use the parent class's __repr__ to pass the eval(repr(obj)) == obj test (I don't think it does right now) and not rely on __str__.Botany
Also check out the total_ordering class decorator - that will eliminate 4 methods from your unicode subclass. But the dict subclass looks very cleverly implemented. :PBotany
Thanks @AaronHall - it's you who implemented that :P Re: total ordering - I intentionally wrote the methods inlined as advised by Raymond Hettinger here: https://mcmap.net/q/103790/-a-case-insensitive-string-class-in-python. Re: repr: I remember reading a comment (by some core dev IIRC) that well, it's not really worth the hassle to try and make repr to pass that test (it's a hassle) - better focus on it being as informative as possible (but not more)Secluded
I'll allow you your redundant comparison methods (you should make a note about it in your answer), but the CIstr.__repr__, in your case, can pass the repr test with very little hassle, and it should make debugging a lot nicer. I'd also add a __repr__ for your dict. I'll do it in my answer to demonstrate.Botany
@AaronHall: I added __slots__ in CIstr - does make a difference in performance (CIstr is not meant to be subclassed or indeed used outside LowerDict, should be a static nested final class). Still not sure how to solve elegantly the repr issue (the sting might contain a combination of ' and " quotes)Secluded
delegate to the builtin __repr__ with super - both classes should do that - __slots__ make sense for CIstr - no __dict__ needed - subclasses of CIstr won't care because __slots__ are empty..Botany
Relevant: PEP 455 and the rejection.Botany
A
6

All you will have to do is

class BatchCollection(dict):
    def __init__(self, *args, **kwargs):
        dict.__init__(*args, **kwargs)

OR

class BatchCollection(dict):
    def __init__(self, inpt={}):
        super(BatchCollection, self).__init__(inpt)

A sample usage for my personal use

### EXAMPLE
class BatchCollection(dict):
    def __init__(self, inpt={}):
        dict.__init__(*args, **kwargs)

    def __setitem__(self, key, item):
        if (isinstance(key, tuple) and len(key) == 2
                and isinstance(item, collections.Iterable)):
            # self.__dict__[key] = item
            super(BatchCollection, self).__setitem__(key, item)
        else:
            raise Exception(
                "Valid key should be a tuple (database_name, table_name) "
                "and value should be iterable")

Note: tested only in python3

Archaeology answered 6/10, 2017 at 7:40 Comment(1)
None of this works for me: the first variant of __init__ gives the error "TypeError: descriptor '__init__' of 'dict' object needs an argument". If I try the other version of __init__ and override __setitem__ as you have done I get "AttributeError: 'super' object has no attribute '_BatchCollection__set__item'", not suprisingly: method name mangling has kicked in. I can't understand how this can have been upvoted 6 times.Barcus
J
3

collections.UserDict is often the simplest option when you need a custom dict.

As shown in the other answer, it's very tricky to overwrite dict correctly, while UserDict makes it easy. To answer the original question, you can get a dict with lower keys:

import collections

class LowercaseDict(collections.UserDict):

  def __getitem__(self, key):
    return super().__getitem__(key.lower())

  def __setitem__(self, key, value):
    return super().__setitem__(key.lower(), value)

  def __delitem__(self, key):
    return super().__delitem__(key.lower())

  # Unfortunately, __contains__ is required currently due to
  # https://github.com/python/cpython/issues/91784
  def __contains__(self, key):
    return key.lower() in self.data


d = LowercaseDict(MY_KEY=0)  # Keys normalized in .__init__
d.update({'OTHER_KEY': 1})  # Keys normalized in .update
d['Hello'] = d['other_KEY']
assert 'HELLO' in d
print(d)  # All keys normalized {'my_key': 0, 'other_key': 1, 'hello': 1}

And contrary to collections.abc.MutableMapping, you don't need __iter__, __len__, __init__,... Subclassing UserDict is much easier.

However UserDict is a MutableMapping, not a dict, so:

assert not isinstance(collections.UserDict(), dict)
assert isinstance(collections.UserDict(), collections.abc.MutableMapping)
Jocose answered 21/4, 2022 at 12:18 Comment(2)
json.dumps(d) >>> TypeError: Object of type LowercaseDict is not JSON serializableMime
I believe that with the current Python version (2023), the first goto when subclassing dictionaries should be to subclass UserDict. That is, as the first goto - there might be reason for other choices in some cases.Unsearchable

© 2022 - 2024 — McMap. All rights reserved.