Python lru_cache: how can currsize < misses < maxsize?
Asked Answered
S

1

6

I have a class with a method that is annotated with the lru_cache annotation:

CACHE_SIZE=16384

class MyClass:
    [...]

    @lru_cache(maxsize=CACHE_SIZE)
    def _my_method(self, texts: Tuple[str]):
       <some heavy text processing>

    def cache_info(self):
        return self._my_method.cache_info()

After running for a while, I look at the cache statistics through the cache_info() method:

c = MyClass()
[...]
c.cache_info()

{
  "hits":9348,
  "misses":4312,
  "maxsize":16384,
  "currsize":2588
}

My question is: how can currsize be smaller than misses AND smaller than maxsize?

My understanding was: for each miss, the result is added to the cache, hence increasing the current size. Only when the current size has reached the maximum size, cached results are removed. Since the maximum size is not reached here yet, each miss should be cached, so currsize should equal misses at this point. However, that does not seem to be the way this works.

Sneaky answered 27/8, 2021 at 10:6 Comment(0)
D
4

If your program is either multi-threaded, or recursive - basically, any sort of condition where _my_method() might be called again while another call is partially completed - then it's possible to see the behavior you're experiencing.

lru_cache() is thread-aware and uses the following set of steps for size-limited caching:

  • make a hash key out of the wrapped function's arguments
  • lock the cache in a with block:
    • look up the key in the cache
    • if the key is in the cache, return the cached value
    • else, if the key isn't in the cache, increase misses by 1
  • call the wrapped function
  • lock the cache again
    • if the result is in the cache now, return it
    • if the result still isn't in the cache, add it, possibly removing older entries, etc. etc.

In other words, the cached value may have been added by another thread while the wrapped function was called, but it's still counted as a miss. If you had multiple calls to _my_method() that looked up the same missing key, causing misses to be incremented but then resulting in the key appearing in the cache by the time _my_method() completes, misses will be higher than currsize.

Degeneracy answered 28/8, 2021 at 0:6 Comment(1)
This explanation sounds very likely, thanks! The class is running within a service using multiple threads indeed, so that requests to the cache could come in in parallel.Sneaky

© 2022 - 2024 — McMap. All rights reserved.