Python: building an LRU cache
Asked Answered
H

4

11

I have around 6,00,000 entries in MongoDB in the following format:

feature:category:count

where

  • feature could be any word,
  • category is positive or negative, and
  • count tells how many times a feature occurred in a document for that category.

I want to cache the top 1000 tuples, let's say so as not to query database each time.

How does one build an LRU cache in Python? Or are there any known solutions to this?

Heloise answered 14/12, 2010 at 20:32 Comment(0)
S
23

The LRU cache in Python3.3 has O(1) insertion, deletion, and search.

The design uses a circular doubly-linked list of entries (arranged oldest-to-newest) and a hash table to locate individual links. Cache hits use the hash table to find the relevant link and move it to the head of the list. Cache misses delete the oldest link and create a new link at the head of the linked list.

Here's a simplified (but fast) version in 33 lines of very basic Python (using only simple dictionary and list operations). It runs on Python2.0 and later (or PyPy or Jython or Python3.x):

class LRU_Cache:

    def __init__(self, original_function, maxsize=1024):
        # Link structure: [PREV, NEXT, KEY, VALUE]
        self.root = [None, None, None, None]
        self.root[0] = self.root[1] = self.root
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = {}

    def __call__(self, *key):
        mapping = self.mapping
        root = self.root
        link = mapping.get(key)
        if link is not None:
            link_prev, link_next, link_key, value = link
            link_prev[1] = link_next
            link_next[0] = link_prev
            last = root[0]
            last[1] = root[0] = link
            link[0] = last
            link[1] = root
            return value
        value = self.original_function(*key)
        if len(mapping) >= self.maxsize:
            oldest = root[1]
            next_oldest = oldest[1]
            root[1] = next_oldest
            next_oldest[0] = root
            del mapping[oldest[2]]
        last = root[0]
        last[1] = root[0] = mapping[key] = [last, root, key, value]
        return value


if __name__ == '__main__':
    p = LRU_Cache(ord, maxsize=3)
    for c in 'abcdecaeaa':
        print(c, p(c))

Starting in Python 3.1, OrderedDict makes it even simpler to implement a LRU cache:

from collections import OrderedDict

class LRU_Cache:

    def __init__(self, original_function, maxsize=1024):
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = OrderedDict()

    def __call__(self, *key):
        mapping = self.mapping
        try:
            value = mapping[key]
            mapping.move_to_end(key)
        except KeyError:
            value = self.original_function(*key)
            if len(mapping) >= self.maxsize:
                mapping.popitem(False)
            mapping[key] = value
        return value
Sivie answered 30/11, 2011 at 19:20 Comment(0)
K
6

Besides the version included in Python 3.2, there are LRU cache recipes in the Python Cookbook including these by Python core developer Raymond Hettinger.

Kingcup answered 14/12, 2010 at 21:13 Comment(0)
C
3

Python 3.2 functools includes an LRU cache. You could easily cherrypick it from repo, check if you have to adjust it to work with Python 2 (shouldn't be too hard - perhaps using itertools instead of certain builtins - ask if you need help) and be done. You need to wrap the query it into a callable and make sure it depends on the (hashable) function arguments, though.

Chatter answered 14/12, 2010 at 20:40 Comment(5)
This seems interesting, but how to I get it from repo??? I don't know how to do that\Heloise
@learner: The easiest way would be copypasting it from the file I linked to.Chatter
I tried but when I try to import functools it throws error >>> import functools Traceback (most recent call last): File "<stdin>", line 1, in <module> File "functools.py", line 151 nonlocal hits, misses ^ SyntaxError: invalid syntax Error in sys.excepthook:Heloise
@learner: Ah, of course. nonlocal is a new python 3 keyword to support closures. It can be emulated with a dirty, but working and succinct hack though. I think I solves that problem, but a new one appeared: It uses collections.OrderedDict, which is only included in 2.7+. Also, the code seems to be broken o.O (referring to OrderedDict.move_to_end which doesn't exist)...Chatter
An OrderedDict implementation is available form pypi.python.org/pypi/ordereddict. cache.move_to_end(key) is easily implemented as cache[key] = cache.pop(key). The dirty hack is to initialize hits and misses as hits_misses = [0, 0], and replace hits with hits_misses[0] and misses with hits_misses[1].Cf
M
1

There are also backports of the python 3.3 version of lru_cache such as this which runs on python 2.7. If you are interested in two layers of caching (if not cached in the instance it will check a shared cache) I have created lru2cache based on the backport of lru_cache.

Motherly answered 2/2, 2016 at 22:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.