Python cachetools stats and use custom key
Asked Answered
P

1

8

I am looking for a way to use python's cachetools built-in library functionality of a cache, but also support hit/miss statistics, use a custom key function, and if possible support an unbounded cache?

Unfortunately, I could only find these ways:

  1. If I want to use an unbound cache, and have hit/miss statistics:
    from cachetools.func import lru_cache
    
    @lru_cache(maxsize=None)
    def foo(a, b, c=None):
        print("foo")
    
  2. If I want to use an unbound cache, and use a custom key function:
    from cachetools import cached
    
    @cached(
        cache={},
        key=lambda a, b, c=None: "a" if c is None else "b"
    )
    def foo(a, b, c=None):
        print("foo")
    
    or, use this "hack":
    from cachetools import cached, LRUCache
    
    @cached(
        cache=LRUCache(maxsize=1, getsizeof=lambda _: 0), # will always pass the test of maxsize check
        key=lambda a, b, c=None: "a" if c is None else "b"
    )
    def foo(a, b, c=None):
        print("foo")
    

How would I go about if I want to use hit/miss statistics and a custom key function?
I know how to implement one on my own, I was just wondering if there is some already built-in way in python's cachetools/functools that supports this?

Paradiddle answered 16/8, 2020 at 13:42 Comment(0)
B
0

So far as I know, cachetools/functools makes you choose between hit/miss statistics and and a custom keyfunction.

I ended up creating a custom cache decorator like this:

from collections.abc import Callable
from functools import _make_key as functools_make_key, cache, wraps
from cachetools import LRUCache
from cachetools.keys import hashkey
from prometheus_client import Gauge

def default_keyfunc(*args: Any, **kwargs: Any) -> int:
    return int(hash(functools_make_key(args=args, kwds=kwargs, typed=False)))

def instrumented_cache(
    maxsize: int, hits: Gauge, misses: Gauge, keyfunc: Callable[..., int] | None = None
) -> Callable[[Callable[..., Any]], Callable[..., Any]]:
    cache = LRUCache(maxsize=maxsize)

    if keyfunc is None:
        keyfunc = default_keyfunc

    def decorator(func: Callable[..., Any]) -> Callable[..., Any]:
        @wraps(func)
        def wrapper(*args: Any, **kwargs: Any) -> Any:
            key = keyfunc(*args, **kwargs)
            if key in cache:
                hits.inc()
                result = cache[key]
            else:
                misses.inc()
                result = func(*args, **kwargs)
                cache[key] = result
            return result

        return wrapper

    return decorator

Here is an example usage:

def custom_keyfunc(stmt: postgresql.Insert) -> int:
    "your custom keyfunc will probably be different"
    return hashkey(str(stmt.compile()), tuple(sorted(stmt.compile().params.items())))  # type: ignore[no-any-return]

hits = Gauge("cache_hits", "Number of cache hits")
misses = Gauge("cache_misses", "Number of cache misses")

@instrumented_cache(maxsize=256, hits=hits, misses=misses, keyfunc=custom_keyfunc)
def get_update_set(stmt: postgresql.Insert) -> dict[str, Any]:
    ...

Following this, I was able to view my hits/misses over time in grafana:

A time series plot showing an initial cache miss and then several cache hits

If you don't have prometheus & grafana set up for viewing metrics, replace hits.inc() and misses.inc() with whatever you want to happen when you detect a cache hit or a cache miss.

Burks answered 30/7 at 2:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.