A forgiving dictionary
Asked Answered
P

5

9

I am wondering how to create forgiving dictionary (one that returns a default value if a KeyError is raised).

In the following code example I would get a KeyError; for example

a = {'one':1,'two':2}
print a['three']

In order not to get one I would 1. Have to catch the exception or use get.

I would like to not to have to do that with my dictionary...

Ploughshare answered 29/7, 2010 at 0:19 Comment(1)
collections.defaultdict is your batteries-included solution.Incantatory
L
22
import collections
a = collections.defaultdict(lambda: 3)
a.update({'one':1,'two':2})
print a['three']

emits 3 as required. You could also subclass dict yourself and override __missing__, but that doesn't make much sense when the defaultdict behavior (ignoring the exact missing key that's being looked up) suits you so well...

Edit ...unless, that is, you're worried about a growing by one entry each time you look up a missing key (which is part of defaultdict's semantics) and would rather get slower behavior but save some memory. For example, in terms of memory...:

>>> import sys
>>> a = collections.defaultdict(lambda: 'blah')
>>> print len(a), sys.getsizeof(a)
0 140
>>> for i in xrange(99): _ = a[i]
... 
>>> print len(a), sys.getsizeof(a)
99 6284

...the defaultdict, originally empty, now has the 99 previously-missing keys that we looked up, and takes 6284 bytes (vs. the 140 bytes it took when it was empty).

The alternative approach...:

>>> class mydict(dict):
...   def __missing__(self, key): return 3
... 
>>> a = mydict()
>>> print len(a), sys.getsizeof(a)
0 140
>>> for i in xrange(99): _ = a[i]
... 
>>> print len(a), sys.getsizeof(a)
0 140

...entirely saves this memory overhead, as you see. Of course, performance is another issue:

$ python -mtimeit -s'import collections; a=collections.defaultdict(int); r=xrange(99)' 'for i in r: _=a[i]'
100000 loops, best of 3: 14.9 usec per loop

$ python -mtimeit -s'class mydict(dict):
>   def __missing__(self, key): return 0
> ' -s'a=mydict(); r=xrange(99)' 'for i in r: _=a[i]'
10000 loops, best of 3: 92.9 usec per loop

Since defaultdict adds the (previously-missing) key on lookup, it gets much faster when such a key is next looked up, while mydict (which overrides __missing__ to avoid that addition) pays the "missing key lookup overhead" every time.

Whether you care about either issue (performance vs memory footprint) entirely depends on your specific use case, of course. It is in any case a good idea to be aware of the tradeoff!-)

Lowrance answered 29/7, 2010 at 0:22 Comment(4)
Warning: defaultdict inserts a new item into itself whenever it returns the default value for a given key. That turns read operations into potential write operations, and means that looking up a lot of missing keys will cause it to grow quickly. docs.python.org/library/…Hach
Excellent post! Your second to last paragraph does not seem to relate to your example since you never use the same key twice. So it seems that defaultdict is faster even if you never repeat a key and even faster if you do. Is that right?Lerma
@Jeff, timeit loops on the statement you're measuring, so it repeats that statement -- in this case, the for i in r loop.Lowrance
+1 I never realized the performance/memory tradeoff of the 2 approaches. Thanks for enlightening me.Leclaire
A
7

New in version 2.5: If a subclass of dict defines a method __missing__(), if the key key is not present, the d[key] operation calls that method with the key key as argument. The d[key] operation then returns or raises whatever is returned or raised by the __missing__(key) call if the key is not present. No other operations or methods invoke __missing__(). If __missing__() is not defined, KeyError is raised. __missing__() must be a method; it cannot be an instance variable. For an example, see collections.defaultdict.

http://docs.python.org/library/stdtypes.html

Attis answered 29/7, 2010 at 0:22 Comment(0)
F
6

Here is how to subclass dict as suggested by NullUserException

>>> class forgiving_dict(dict):
...     def __missing__(self, key):
...         return 3
...
>>> a = forgiving_dict()
>>> a.update({'one':1,'two':2})
>>> print a['three']
3

One big difference between this answer and Alex's is that the missing key is not added to the dictionary

>>> print a
{'two': 2, 'one': 1}

Which is quite significant if you expect a lot of misses

Fluoridation answered 29/7, 2010 at 1:12 Comment(0)
D
3

You'll probably want to use a defaultdict (it requires atleast python2.5 I believe)

from collections import defaultdict
def default(): return 'Default Value'
d = defaultdict(default)
print(d['?'])

The function that is passed to the constructor tells the class what to return as a default value. See the documentation for additional examples.

Daffi answered 29/7, 2010 at 0:25 Comment(0)
P
0

Sometimes what you really want is .setdefault() which is not very intuitive, but it's a method that "returns the key specified, if it doesn't exist, set that key to this value".

Here is an example of setdefault() being used to good effect:

collection = {}
for elem in mylist:
    key = key_from_elem(elem)
    collection.setdefault(key, []).append(elem)

This will allow us to create a dictionary like: {'key1':[elem1, elem3], 'key2':[elem3]} without having to have an ugly check to see if there's a key already there and creating a list for it.

Paleoclimatology answered 29/7, 2010 at 3:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.