Exposing `defaultdict` as a regular `dict`
Asked Answered
V

3

42

I am using defaultdict(set) to populate an internal mapping in a very large data structure. After it's populated, the whole structure (including the mapping) is exposed to the client code. At that point, I don't want anyone modifying the mapping.

And nobody does, intentionally. But sometimes, client code may by accident refer to an element that doesn't exist. At that point, a normal dictionary would have raised KeyError, but since the mapping is defaultdict, it simply creates a new element (an empty set) at that key. This is quite hard to catch, since everything happens silently. But I need to ensure this doesn't happen (the semantics actually doesn't break, but the mapping grows to a huge size).

What should I do? I can see these choices:

  1. Find all the instances in current and future client code where a dictionary lookup is performed on the mapping, and convert it to mapping.get(k, {}) instead. This is just terrible.

  2. "Freeze" defaultdict after the data structure is fully initialized, by converting it to dict. (I know it's not really frozen, but I trust client code to not actually write mapping[k] = v.) Inelegant, and a large performance hit.

  3. Wrap defaultdict into a dict interface. What's an elegant way to do that? I'm afraid the performance hit may be huge though (this lookup is heavily used in tight loops).

  4. Subclass defaultdict and add a method that "shuts down" all the defaultdict features, leaving it to behave as if it's a regular dict. It's a variant of 3 above, but I'm not sure if it's any faster. And I don't know if it's doable without relying on the implementation details.

  5. Use regular dict in the data structure, rewriting all the code there to first check if the element is in the dictionary and adding it if it's not. Not good.

Vansickle answered 20/11, 2012 at 2:20 Comment(9)
the "rewriting" would just use the dict.setdefault method... No big dealUlberto
@Ulberto Are you talking about option 4? All I know about defaultdict is that it overrides __getitem__ to add an element if needed. Maybe it does that using setdefault method, maybe it implements the same logic directly without ever calling setdefault. Without relying on implementation details, I can't assume anything, can I?Vansickle
He is referring to your option #5. Just use your data.setdefault() in your code in replacement of defaultdictFoulness
I think you should be able to get away with just calling dict on teh defaultdict to dictify itUnjust
@Pyson: ah you're right, it makes sense. But that's the argument in favor of never using defaultdict, isn't it? (Not that I disagree, just want to understand the logic.)Vansickle
@Unjust the size of the data structure is over 1 GB, so copying all the data (as would happen if I call dict) is too expensive.Vansickle
Well, defaultdict is faster in most cases...Foulness
@Pyson: Why? dict.setdefault is implemented in C, and it does precisely what defaultdict.__getitem__ does. Shouldn't it be equally fast?Vansickle
You would think so, huh?Foulness
F
67

defaultdict docs say for default_factory:

If the default_factory attribute is None, this raises a KeyError exception with the key as argument.

What if you just set your defaultdict's default_factory to None? E.g.,

>>> d = defaultdict(int)
>>> d['a'] += 1
>>> d
defaultdict(<type 'int'>, {'a': 1})
>>> d.default_factory = None
>>> d['b'] += 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'b'
>>> 

Not sure if this is the best approach, but seems to work.

Figwort answered 20/11, 2012 at 2:29 Comment(4)
Who knew that the solution I proposed was already implemented as a feature of defaultdict? Great find. (+1)Henceforth
Wow, this is perfect. I hope it's safe to change default_factory for an existing defaultdict object (I don't see why not).Vansickle
@Vansickle -- The documentation specifically says that default_factory is a writable attribute, so it should be safe.Henceforth
@max: Use the source: defdictobject, defdict_members (name, type, offset, flags, doc; flags==0 means it's writable), defdict_missing.Mimosa
R
4

Once you have finished populating your defaultdict, you can simply create a regular dict from it:

my_dict = dict(my_default_dict)

If the default dict is a recursive default dict, see this answer which has a recursive solution.

Rehearse answered 24/9, 2017 at 19:35 Comment(0)
W
0

You could make a class that holds a reference to your dict and prevent setitem()

from collections import Mapping

class MyDict(Mapping):
    def __init__(self, d):
        self.d = d;

    def __getitem__(self, k):
        return self.d[k]

    def __iter__(self):
        return self.__iter__()

    def __setitem__(self, k, v):
        if k not in self.d.keys():
            raise KeyError
        else:
            self.d[k] = v
Warehouse answered 20/11, 2012 at 2:48 Comment(3)
Wouldn't it be super slow, given that it uses pure python for critical methods?Vansickle
For the getitem method? Not sure the performance overhead with that vs. defaultdict'sWarehouse
Either way, I think Neal's solution is best for your problemWarehouse

© 2022 - 2024 — McMap. All rights reserved.