Setup dictionary lazily
Asked Answered
J

9

30

Let's say I have this dictionary in python, defined at the module level (mysettings.py):

settings = {
    'expensive1' : expensive_to_compute(1),
    'expensive2' : expensive_to_compute(2),
    ...
}

I would like those values to be computed when the keys are accessed:

from mysettings import settings # settings is only "prepared"

print settings['expensive1'] # Now the value is really computed.

Is this possible? How?

Johnathon answered 21/5, 2013 at 11:53 Comment(1)
the problem is that if you keep your module as is, the from mysettings import settings evaluates the content of the module, and therefore fully creates the dict.Sirmons
A
8

If you don't separe the arguments from the callable, I don't think it's possible. However, this should work:

class MySettingsDict(dict):

    def __getitem__(self, item):
        function, arg = dict.__getitem__(self, item)
        return function(arg)


def expensive_to_compute(arg):
    return arg * 3

And now:

>>> settings = MySettingsDict({
'expensive1': (expensive_to_compute, 1),
'expensive2': (expensive_to_compute, 2),
})
>>> settings['expensive1']
3
>>> settings['expensive2']
6

Edit:

You may also want to cache the results of expensive_to_compute, if they are to be accessed multiple times. Something like this

class MySettingsDict(dict):

    def __getitem__(self, item):
        value = dict.__getitem__(self, item)
        if not isinstance(value, int):
            function, arg = value
            value = function(arg)
            dict.__setitem__(self, item, value)
        return value

And now:

>>> settings.values()
dict_values([(<function expensive_to_compute at 0x9b0a62c>, 2),
(<function expensive_to_compute at 0x9b0a62c>, 1)])
>>> settings['expensive1']
3
>>> settings.values()
dict_values([(<function expensive_to_compute at 0x9b0a62c>, 2), 3])

You may also want to override other dict methods depending of how you want to use the dict.

Artless answered 21/5, 2013 at 12:5 Comment(1)
Storing function and overwriting __getitem__ is smart, while I think it would be better to inherit abc.Mapping instead of build-in dict. Otherwise, it doesn't support .get(). You can check my example here gist.github.com/ligyxy/9b50bb8537069b4e154fec41a4b5995aInsistence
I
13

Don't inherit build-in dict. Even if you overwrite dict.__getitem__() method, dict.get() would not work as you expected.

The right way is to inherit abc.Mapping from collections.

from collections.abc import Mapping

class LazyDict(Mapping):
    def __init__(self, *args, **kw):
        self._raw_dict = dict(*args, **kw)

    def __getitem__(self, key):
        func, arg = self._raw_dict.__getitem__(key)
        return func(arg)

    def __iter__(self):
        return iter(self._raw_dict)

    def __len__(self):
        return len(self._raw_dict)

Then you can do:

settings = LazyDict({
    'expensive1': (expensive_to_compute, 1),
    'expensive2': (expensive_to_compute, 2),
})

I also list sample code and examples here: https://gist.github.com/gyli/9b50bb8537069b4e154fec41a4b5995a

Insistence answered 9/11, 2017 at 22:36 Comment(2)
What is the advantage of inheriting abc.Mapping and overriding __iter__() and __len__() over inheriting dict and overriding only get()?Agglomeration
@Agglomeration There seems to be more than just the get() method that needs to be adapted when inheriting dict, e.g. also the pop() method. See also treyhunner.com/2019/04/…Quotidian
A
8

If you don't separe the arguments from the callable, I don't think it's possible. However, this should work:

class MySettingsDict(dict):

    def __getitem__(self, item):
        function, arg = dict.__getitem__(self, item)
        return function(arg)


def expensive_to_compute(arg):
    return arg * 3

And now:

>>> settings = MySettingsDict({
'expensive1': (expensive_to_compute, 1),
'expensive2': (expensive_to_compute, 2),
})
>>> settings['expensive1']
3
>>> settings['expensive2']
6

Edit:

You may also want to cache the results of expensive_to_compute, if they are to be accessed multiple times. Something like this

class MySettingsDict(dict):

    def __getitem__(self, item):
        value = dict.__getitem__(self, item)
        if not isinstance(value, int):
            function, arg = value
            value = function(arg)
            dict.__setitem__(self, item, value)
        return value

And now:

>>> settings.values()
dict_values([(<function expensive_to_compute at 0x9b0a62c>, 2),
(<function expensive_to_compute at 0x9b0a62c>, 1)])
>>> settings['expensive1']
3
>>> settings.values()
dict_values([(<function expensive_to_compute at 0x9b0a62c>, 2), 3])

You may also want to override other dict methods depending of how you want to use the dict.

Artless answered 21/5, 2013 at 12:5 Comment(1)
Storing function and overwriting __getitem__ is smart, while I think it would be better to inherit abc.Mapping instead of build-in dict. Otherwise, it doesn't support .get(). You can check my example here gist.github.com/ligyxy/9b50bb8537069b4e154fec41a4b5995aInsistence
S
5

Store references to the functions as the values for the keys i.e:

def A():
    return "that took ages"
def B():
    return "that took for-ever"
settings = {
    "A": A,
    "B": B,
}

print(settings["A"]())

This way, you only evaluate the function associated with a key when you access it and invoke it. A suitable class which can handle having non-lazy values would be:

import types
class LazyDict(dict):
    def __getitem__(self,key):
        item = dict.__getitem__(self,key)
        if isinstance(item,types.FunctionType):
            return item()
        else:
            return item

usage:

settings = LazyDict([("A",A),("B",B)])
print(settings["A"])
>>> 
that took ages
Sirup answered 21/5, 2013 at 12:4 Comment(0)
B
3

You can make expensive_to_compute a generator function:

settings = {
    'expensive1' : expensive_to_compute(1),
    'expensive2' : expensive_to_compute(2),
}

Then try:

from mysettings import settings

print next(settings['expensive1'])
Brightman answered 21/5, 2013 at 12:0 Comment(1)
Interesting idea, but not what I am looking for. I would really like to keep the dictionary api untouched.Johnathon
S
3

I recently needed something similar. Mixing both strategies from Guangyang Li and michaelmeyer, here is how I did it:

class LazyDict(MutableMapping):
  """Lazily evaluated dictionary."""

  function = None

  def __init__(self, *args, **kargs):
    self._dict = dict(*args, **kargs)

  def __getitem__(self, key):
      """Evaluate value."""
      value = self._dict[key]
      if not isinstance(value, ccData):
          value = self.function(value)
      self._dict[key] = value
      return value

  def __setitem__(self, key, value):
      """Store value lazily."""
      self._dict[key] = value

  def __delitem__(self, key):
      """Delete value."""
      return self._dict[key]

  def __iter__(self):
      """Iterate over dictionary."""
      return iter(self._dict)

  def __len__(self):
      """Evaluate size of dictionary."""
      return len(self._dict)

Let's lazily evaluate the following function:

def expensive_to_compute(arg):
  return arg * 3

The advantage is that the function is yet to be defined within the object and the arguments are the ones actually stored (which is what I needed):

>>> settings = LazyDict({'expensive1': 1, 'expensive2': 2})
>>> settings.function = expensive_to_compute # function unknown until now!
>>> settings['expensive1']
3
>>> settings['expensive2']
6

This approach works with a single function only.

I can point out the following advantages:

  • implements the complete MutableMapping API
  • if your function is non-deterministic, you can reset a value to re-evaluate
Superinduce answered 10/4, 2020 at 16:4 Comment(0)
Q
3

I would populate the dictionary values with callables and change them to the result upon reading.

class LazyDict(dict):
    def __getitem__(self, k):
        v = super().__getitem__(k)
        if callable(v):
            v = v()
            super().__setitem__(k, v)
        return v

    def get(self, k, default=None):
        if k in self:
            return self.__getitem__(k)
        return default

Then with

def expensive_to_compute(arg):
    print('Doing heavy stuff')
    return arg * 3

you can do:

>>> settings = LazyDict({
    'expensive1': lambda: expensive_to_compute(1),
    'expensive2': lambda: expensive_to_compute(2),
})

>>> settings.__repr__()
"{'expensive1': <function <lambda> at 0x000001A0BA2B8EA0>, 'expensive2': <function <lambda> at 0x000001A0BA2B8F28>}"

>>> settings['expensive1']
Doing heavy stuff
3

>>> settings.get('expensive2')
Doing heavy stuff
6

>>> settings.__repr__()
"{'expensive1': 3, 'expensive2': 6}"
Quotidian answered 15/4, 2020 at 8:10 Comment(0)
P
2

Alternatively, one can use the LazyDictionary package that creates a thread-safe lazy dictionary.

Installation:

pip install lazydict

Usage:

from lazydict import LazyDictionary
import tempfile
lazy = LazyDictionary()
lazy['temp'] = lambda: tempfile.mkdtemp()
Preciousprecipice answered 15/4, 2021 at 8:9 Comment(0)
N
1

pass in a function to generate the values on the first attribute get:

class LazyDict(dict):
  """ Fill in the values of a dict at first access """
  def __init__(self, fn, *args, **kwargs):
    self._fn = fn
    self._fn_args = args or []
    self._fn_kwargs = kwargs or {}
    return super(LazyDict, self).__init__()
  def _fn_populate(self):
    if self._fn:
      self._fn(self, *self._fn_args, **self._fn_kwargs)
      self._fn = self._fn_args = self._fn_kwargs = None
  def __getattribute__(self, name):
    if not name.startswith('_fn'):
      self._fn_populate()
    return super(LazyDict, self).__getattribute__(name)
  def __getitem__(self, item):
    self._fn_populate()
    return super(LazyDict, self).__getitem__(item)



>>> def _fn(self, val):
...   print 'lazy loading'
...   self['foo'] = val
... 
>>> d = LazyDict(_fn, 'bar')
>>> d
{}
>>> d['foo']
lazy loading
'bar'
>>> 
Nowlin answered 24/7, 2020 at 20:46 Comment(0)
T
1

Adding this solution: leveraging the __missing__ attribute of dict’, which if set for a dict, will be invoked on missing keys; effectively a cache loading pattern.

Below is a type-aware implementation - which is not necessary, but nice to have. In the OP's submission, one could easily initialize with a simple dispatch function of their choice. This offers maximum flexibility with little overhead.

from typing import Dict, TypeVar, Callable

_KT = TypeVar("_KT")
_VT = TypeVar("_VT")

class LoadingDict(dict[_KT, _VT]):
  def __init__(self, fn: Callable[[_KT], _VT], **kwargs):
    if not callable(fn):
      raise TypeError(type(fn))
    super().__init__(**kwargs)
    self._fn = fn

  def __missing__(self, key: _KT) -> _VT:
    if not isinstance(key, self.__orig_class__.__args__[0]):
      raise ValueError(type(key))
    self[key] = v = self._fn(key)
    return v

#### Usage

d = LoadingDict[str, int](len, **{"1": -1})

assert d["1"] == -1
assert d["xxx"] == 3
assert d["xxxx"] == 4
Terenceterencio answered 14/12, 2023 at 19:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.