Why Use A Doubly Linked List and HashMap for a LRU Cache Instead of a Deque?

Asked 17/2, 2019 at 6:21 Answered 12/8, 2024 at 2:38

I have implemented the design a LRU Cache Problem on LeetCode using the conventional method (Doubly Linked List+Hash Map). For those unfamiliar with the problem, this implementation looks something like this:

I understand why this method is used (quick removal/insertion at both ends, fast access in the middle). What I am failing to understand is why someone would use both a HashMap and a LinkedList when one could simply use a array-based deque (in Java ArrayDeque, C++ simply deque). This deque allows for ease of insertion/deletion at both ends, and quick access in the middle which is exactly what you need for an LRU cache. You also would use less space because you wouldn't need to store a pointer to each node.

Is there a reason why the LRU cache is almost universally designed (on most tutorials at least) using the latter method as opposed to the Deque/ArrayDeque method? Would the HashMap/LinkedList method have any benefits?

Bilander answered 17/2, 2019 at 6:21 Comment(12)

Doubly linked list could also be stack, queue, deque at the same time, apart from linked list and doubly linked list, so it's very handy. For example Java's LinkedList is such an implementation. Actually, when you try to impl stack and queue, you will found most of the operations could be done simply by wrapping operations from a well-implemented doubly linked list. – Ubana 17/2, 2019 at 6:32

[Continued] If the size changes frequently, then array-based structures might need re-size often, which is expensive, so that's not good for performance. – Ubana 17/2, 2019 at 6:38

The other main reason not to use array-based structure is adding / removing from non-end is expensive, need to shift following elements, and might cause resizing. While linked list don't have this issue, The issue to linked list is it can't access element in the middle quickly, but if you only access on 2 ends, then doubly linked list is also O(1). – Ubana 17/2, 2019 at 6:49

@EricWang you only remove from beginning and end in an LRU cache. – Bilander 17/2, 2019 at 6:49

Removing beginning of an array is very expensive, need to shift every elements following left by 1 position. I guess that's the major reason it's not suitable here. Another minor reason is, even though LRU is fixed size, at the very beginning it might not be full, and with array-based structure, you need to allocate it no matter whether it's full, but with linked list, you allocate as need. – Ubana 17/2, 2019 at 6:50

@EricWang Removing from the beginning of an ArrayDeque is O(1). – Bilander 17/2, 2019 at 6:56

If your ArrayDeque is array-based, then you can't remove from both end at O(1), at least one of them is not. If it's not array-based, why it's called ArrayDeque ? – Ubana 17/2, 2019 at 6:57

@EricWang The same way adding to the back of an arrayList is amortized O(1), even though it's array-based, you just resize the array. Proof: cplusplus.com/reference/deque/deque/pop_front (the C++ deque is basically an ArrayDeque, and the complexity is listed as constant) – Bilander 17/2, 2019 at 7:3

In that case only part of the array is usable, then a resize might be needed. – Ubana 17/2, 2019 at 7:5

If you have written an LRU cache and didn't need a move_to_end operation, then you have probably written a FIFO cache and not an LRU cache. – Systematist 17/2, 2019 at 16:56

@EricWang: In [case of elements deleted from the head of an ArrayDeque] only part of the array is usable The indices used wrap around at the current size. – Costar 17/2, 2019 at 16:56

Oh, and if you're using Java, see #53417687 – Systematist 17/2, 2019 at 17:10

When an LRU cache is full, we discard the Least Recently Used item.

If we're discarding items from the front of the queue, then, we have to make sure the item at the front is the one that hasn't been used for the longest time.

We ensure this by making sure that an item goes to the back of the queue whenever it is used. The item at the front is then the one that hasn't been moved to the back for the longest time.

To do this, we need to maintain the queue on every put OR get operation:

When we put a new item in the cache, it becomes the most recently used item, so we put it at the back of the queue.
When we get an item that is already in the cache, it becomes the most recently used item, so we move it from its current position to the back of the queue.

Moving items from the middle to the end is not a deque operation and is not supported by the ArrayDeque interface. It's also not supported efficiently by the underlying data structure that ArrayDeque uses. Doubly-linked lists are used because they do support this operation efficiently.

Systematist answered 17/2, 2019 at 16:37 Comment(3)

Dunno if I should ask this in a separate question, but why is it put(key, value) instead of just put(value)? The examples on Leetcode and many other tutorial sites have key and value as identical. – Bilander 17/2, 2019 at 17:10

@user3586940 because get(value) doesn't usually make sense. The usual case is that when the value isn't in the cache it is calculated from the key -- if you already know the value then you don't need to look it up in a cache at all. It does make sense if you're using the cache just to avoid having duplicate copies of the values. Then key and value are the same. – Systematist 17/2, 2019 at 17:14

@Primusa the term "LRU cache" is common and has an accepted definition. The implementation the OP quotes obviously refers to that commonly accepted definition. Your answer refers to some other kind of cache that does not expire the "least recently used" key. Probably you are thinking of a FIFO cache, which also has a commonly accepted definition. Your answer implies that what "everybody does" is wasteful and unnecessary, but in fact it is not. I called your answer "entirely incorrect", because anyone who believes it will not know what an LRU cache is until someone corrects it. – Systematist 17/2, 2019 at 18:56

Doubly linked list is the implementation of the queue. Because doubly linked lists have immediate access to both the front and end of the list, they can insert data on either side at O(1) as well as delete data on either side at O(1). Because doubly linked lists can insert data at the end in O(1) time and delete data from the front in O(1) time, they make the perfect underlying data structure for a queue. Queeus are lists of items in which data can only be inserted at the end and removed from the beginning.

Queues are an example of an abstract data type, and that we are able to use an array to implement them under the hood. Now, since queues insert at the end and delete from the beginning, arrays are only so good as the underlying data structure. While arrays are O(1) for insertions at the end, they’re O(N) for deleting from the beginning. A doubly linked list, on the other hand, is O(1) for both inserting at the end and for deleting from the beginning. That’s what makes it a perfect fit for serving as the queue’s underlying data structure.

Pyhon deque uses a linked list as part of its data structure. This is the kind of linked list it uses. With doubly linked lists, deque is capable of inserting or deleting elements from both ends of a queue with constant O(1) performance. pyhton-deque

Hesler answered 1/11, 2021 at 3:37 Comment(1)

Hi @Yilmaz, why not just keep a reference to the tail and the head of the simple linked list ? instead of using the double linked list which seems to me a waste of memory – Pears 2/3, 2022 at 11:52

I will answer why you wouldn't use a deque to hold the cache order in an LRU implemented in C++.

Note: in our simple example "use" includes the initial put. I will only consider the data structure used for cache ordering.

Consider a simple LRU where k = int and v = int. You have done the following: put(1,1) put(2,2) put(3,3)

Your deque will look like (left = least recently used, right = most recently used): [ (1,1), (2,2), (3,3) ]

Now say you do the following: put(2,9)

We need to update our cache ordering data structure so that the entry for key = 2 is at the very right.

When using an std::deque the time complexity of removing (2,2) in this instance is going to be O(n) where n is the number of elements before (or after the element being removed). This is because the implementation is using a number of blocks of contiguous memory (aka arrays).

https://en.cppreference.com/w/cpp/container/deque/erase

Now, if we instead used a double linked list, this would be constant time O(1) because you manipulate the pointer before and after instead.

Hope this helps you!

Manville answered 12/8, 2024 at 2:38 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags