How to design a latest recently used cache?
Asked Answered
S

6

6

How to design a latest recently used cache?

Suppose that you have visited some items. You need to design a data structure to hold these items. Each item is associated with the latest visited time.

Each time when you visit an item, check it in the data structure. If the item has been in the cache, update its visit time. Otherwise, insert it into the cache. The cache size is fixed, if it is full, delete the oldest item.

My solution:

  1. Use a map < item, visitTime >

  2. Initaliztion: Sort the map with f(visitTime) with descending order. O(nlg n)

  3. If an item is visited, search it in the map with O(lg n).

  4. If it has been in the map, update the time O(1). Sort the map O(lg n).

  5. If not, insert it into map and then sort. O(lg n)

  6. If map size > fixed size, delet the last element O(1).

Another solution:

  1. Use hashtable < item , visitTime >

  2. Sort it O(n lgn).

  3. If an item is visited, search it in the talbe with O(1).

  4. If it has been in the table , update the time O(1). Sort the table O(n lg n).

  5. If not, insert it into table and then sort. O(n lg n)

  6. If table size > fixed size, delet the last element O(1).

Are there better solutions ? O(n) ?

Staceystaci answered 29/11, 2011 at 19:34 Comment(2)
What do you mean by "sort it"? Why do you want to sort (and copy where?) the (unordered_)map? The linear search will find the oldest item to remove.Raiment
How big of a cache are you contemplating here? Big-O deals with asymptotic complexity, but a cache is often small enough that complexity is rarely the sole (or often even most important) concern.Olinger
B
4

If you use a Doubly Linked List, you'll get O(1) insertion (after search), O(1) deletion, O(n) search.

Assuming you insert new items in the front:

If the cache is not full, just add to the front (O(1)).

If you need to update an item, find it (O(n)), remove it from the linked list (O(1)), then add to the front (O(1)).

If you need to delete the oldest to insert a new item, delete the end element (O(1)), and insert to the front (O(1)) [note: you need to search the list first in this case to see if the item is not already in the cache, so O(n)].

A Linked List can also give you the same time, since the search will leave you at the last element.

Bidden answered 29/11, 2011 at 19:39 Comment(4)
This doesn't scale well because of the O(n) search step. The Python implement discussed in the other answer reduces this to an O(1) step.Ot
He asked for O(n), so that's what I gave him. A simple solution too.Bidden
Fair enough. Hopefully, other people who search this question will aim higher :-)Ot
Down below is description of O(1) solution that can be implemented in C++. All you need is one map and one list.Saxtuba
O
3

Python's LRU cache has O(1) insertion, deletion, and search. Its design uses a doubly-linked list of entries (arranged oldest-to-newest) and a hash table to locate a particular link.

Here's a simplified (but fast) version in under 40 lines of very basic Python. It shouldn't be hard to translate Python's solution into C++:

class LRU_Cache(object):

    def __init__(self, original_function, maxsize=1000):
        self.original_function = original_function
        self.maxsize = maxsize
        self.mapping = {}

        PREV, NEXT, KEY, VALUE = 0, 1, 2, 3
        self.head = [None, None, None, None]        # oldest
        self.tail = [self.head, None, None, None]   # newest
        self.head[NEXT] = self.tail

    def __call__(self, *key):
        PREV, NEXT, KEY, VALUE = 0, 1, 2, 3
        mapping, head, tail = self.mapping, self.head, self.tail
        sentinel = object()

        link = mapping.get(key, sentinel)
        if link is sentinel:
            value = self.original_function(*key)
            if len(mapping) >= self.maxsize:
                oldest = head[NEXT]
                next_oldest = oldest[NEXT]
                head[NEXT] = next_oldest
                next_oldest[PREV] = head
                del mapping[oldest[KEY]]
            last = tail[PREV]
            link = [last, tail, key, value]
            mapping[key] = last[NEXT] = tail[PREV] = link
        else:
            link_prev, link_next, key, value = link
            link_prev[NEXT] = link_next
            link_next[PREV] = link_prev
            last = tail[PREV]
            last[NEXT] = tail[PREV] = link
            link[PREV] = last
            link[NEXT] = tail
        return value

if __name__ == '__main__':
    p = LRU_Cache(ord, maxsize=3)
    for c in 'abcdecaeaa':
        print(c, p(c))
Ot answered 29/11, 2011 at 19:59 Comment(1)
impressive. what about change "if size > maxsize:" => "if size >= maxsize:"?Starnes
S
1

Use two collections that share the same data. Have one hashtable and one list. Use hashtable to verify if item exists, and to find it in the list (value of hash map is list iterator). Use list to maintain order between items. Synchronize two collections (when removing item from the list, remove corresponding item from hashtable). List iterator must be such that it does not change when you relocate the item within the list.

Edit: std::list iterator is valid throughout addition and deletion of elements, provided that the very element iterator is referencing to is not removed. See last lines in the section Capacity and Allocation in Wikipedia.

Saxtuba answered 29/11, 2011 at 19:44 Comment(2)
If list item is relocated, the iterator will be changed. Each time a new item is inserted in front, or deleted from the list, some or even all list item locations may change. how to keep hash table to get the new location of each item ? it is O(n).Staceystaci
@user1002288, I updated my answer. std::list iterator is safe.Saxtuba
S
1

You can do it in Java with the java.util.LinkedHashSet. It is a hash table coupled with a linked list which preserves the order in which the items were inserted. You should get (expected) constant time look-ups, insertions and removals if key dispersal works well.

You may also want to look at WeakHashMap which implements an automated mechanism where elements can be garbage collected.

Sweet answered 29/11, 2011 at 20:0 Comment(1)
+1 This is essentially what the Python version does. It is a much better solution than a doubly-linked list without a hash table.Ot
I
0

You don't really have to sort the container. Just add the items to the map or vector, and go over it linearly to find the needed item (or the oldest item).

Then it will be O(n).

Inlier answered 29/11, 2011 at 19:38 Comment(0)
T
0

Have a look at boost::multi_index. One of the examples that it shows is an MRU List.

Terminal answered 29/11, 2011 at 21:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.