Separate lru for each argument value?

Asked 8/9, 2023 at 14:41 Answered 16/9, 2023 at 19:54

I need to write a sql database column getter, that takes in a column name and time, and returns the entire column of values for that column corresponding to the input time. This may be a frequent function call with the same arguments, so I would like to use an lru cache. However, I'm not sure if the frequency of the column names is uniformly distributed, so ideally, I would have a separate lru cache for each column name.

I previously had it like below, but I would like to separate the lru for each col_name.

@lru_cache(...)
def get_col(self, col_name, time)
  # do stuff to get the column and return it

How can I achieve this? Also, unfortunately, I have to support py2.

Perrine answered 8/9, 2023 at 14:41 Comment(6)

Is the @lru_cache you are using in Python 2 a backport using some external library? – Quickel 8/9, 2023 at 14:49

@Quickel I'm actually not familiar (I'm working in an existing codebase). I think it's customized and based on the one in functools. the one in functools if for py3+ iirc – Perrine 8/9, 2023 at 14:52

So are you writing for Python2 or Python3? Writing something that will work for both is not a viable solution, and Python2 is pretty much a dead language for over 3 years now. – Quickel 8/9, 2023 at 14:58

@Quickel I have to write in py2 for this particular case. I don't have a say on what version of python I can use – Perrine 8/9, 2023 at 14:59

Okay, at least it's not 2 different versions. I might try to come up with something that should be portable to Py2, but never wrote anything in 2, so any differences you'd have to adapt to on your own. – Quickel 8/9, 2023 at 15:2

@Quickel okay, thanks. I'd be curious about a py3 answer as well because this might be a common use case in the future for other py3-only codebases – Perrine 8/9, 2023 at 15:33

An LRU cache is simply a mapping that preserves insertion order so the most recently used item can be moved to the front and the least recently used item can be removed when the maximum size of the cache is reached. In both Python 2 and Python 3, such a data structure can be found in the standard library as collections.OrderedDict with the popitem(last=False) method.

To more easily initialize an LRU cache for each column name, you can also use collections.defaultdict:

from collections import defaultdict, OrderedDict

class Client:
    def __init__(self, cache_size=3): # a cache size of only 3 for demo purposes
        self.col_cache = defaultdict(OrderedDict)
        self.cache_size = cache_size

    def get_col(self, col_name, time):
        cache = self.col_cache[col_name]
        try:
            cache[time] = cache.pop(time)
        except KeyError:
            if len(cache) >= self.cache_size:
                cache.popitem(last=False)
            cache[time] = '%s at %s' % (col_name, time) # make DB query here
        print(cache)
        return cache[time]

so that:

c = Client()
print(c.get_col('foo', 1))
print(c.get_col('foo', 2))
print(c.get_col('foo', 3))
print(c.get_col('foo', 4))
print(c.get_col('foo', 3))
print(c.get_col('foo', 1))

outputs:

OrderedDict([(1, 'foo at 1')])
foo at 1
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2')])
foo at 2
OrderedDict([(1, 'foo at 1'), (2, 'foo at 2'), (3, 'foo at 3')])
foo at 3
OrderedDict([(2, 'foo at 2'), (3, 'foo at 3'), (4, 'foo at 4')])
foo at 4
OrderedDict([(2, 'foo at 2'), (4, 'foo at 4'), (3, 'foo at 3')])
foo at 3
OrderedDict([(4, 'foo at 4'), (3, 'foo at 3'), (1, 'foo at 1')])
foo at 1

Demo: Try it online!

Note that since Python 3.2, collections.OrderedDict also has a move_to_end method for you to move an item to the front, so you can change the line:

cache[time] = cache.pop(time)

to simply:

cache.move_to_end(time)

Substance answered 11/9, 2023 at 4:36 Comment(0)

We may preserve the login of an existing implementation of lru_cache but organise a mapping of chosen argument values to separate caches within an outer decorator. Here is a sample implementation:

from functools import lru_cache, wraps
from inspect import getcallargs


def lru_agrument_cache(argument_name):
    def decorator(function):
        callable_buckets = {}

        @wraps(function)
        def wrapper(*args, **kwargs):
            inspection = getcallargs(function, *args, **kwargs)
            
            # should use functools._make_key() for more general use
            bucket_key = inspection[argument_name]

            if bucket_key in callable_buckets: 
                return callable_buckets[bucket_key](*args, **kwargs)
            
            callable_buckets[bucket_key] = lru_cache(function)

            return callable_buckets[bucket_key](*args, **kwargs)
        
        # just to demonstrate usage
        def cache_info(argument_value):
            try:
                return callable_buckets[argument_value].cache_info()
            except KeyError:
                return None

        wrapper.cache_info = cache_info
        return wrapper

    return decorator

Usage:

@lru_agrument_cache('key')
def my_callable(key, times):
    return key * times

Verify:

my_callable('A', 2)
Out[16]: 'AA'
my_callable('A', 2)
Out[17]: 'AA'
my_callable('A', 3)
Out[18]: 'AAA'
my_callable('B', 2)
Out[19]: 'BB'

my_callable.cache_info('A')
Out[20]: CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)
my_callable.cache_info('B')
Out[21]: CacheInfo(hits=0, misses=1, maxsize=128, currsize=1)

inspect is also available in Python 2.*;
functools.lru_cache is to replace with that lru_cache currently is use;
functools.wraps and it's dependency functools.partial if not available, may than be taken from sources.
not sure what to add concerning thread safety though;

Dillydally answered 16/9, 2023 at 19:54 Comment(0)

Recommended topics

Hot tags