ConcurrentModificationException when updating stored Iterator (for LRU cache implementation)

B

3

13

I am trying to implement my own LRU cache. Yes, I know that Java provides a LinkedHashMap for this purpose, but I am trying to implement it using basic data structures.

From reading about this topic, I understand that I need a HashMap for O(1) lookup of a key and a linked list for management of the "least recently used" eviction policy. I found these references that all use a standard library hashmap but implement their own linked list:

"What data structures are commonly used for LRU caches and quickly locating objects?" (stackoverflow.com)
"What is the best way to Implement a LRU Cache?" (quora.com)
"Implement a LRU Cache in C++" (uml.edu)
"LRU Cache (Java)" (programcreek.com)

The hash table is supposed to directly store a linked list Node as I show below. My cache should store Integer keys and String values.

However, in Java the LinkedList collection does not expose its internal nodes, so I can't store them inside the HashMap. I could instead have the HashMap store indices into the LinkedList, but then getting to an item would require O(N) time. So I tried to store a ListIterator instead.

import java.util.Map;
import java.util.HashMap;
import java.util.List;
import java.util.LinkedList;
import java.util.ListIterator;

public class LRUCache {

    private static final int DEFAULT_MAX_CAPACITY = 10;

    protected Map<Integer, ListIterator> _map = new HashMap<Integer, ListIterator>();
    protected LinkedList<String> _list = new LinkedList<String>();

    protected int _size = 0;
    protected int _maxCapacity = 0;

    public LRUCache(int maxCapacity) {
        _maxCapacity = maxCapacity;
    }

    // Put the key, value pair into the LRU cache.
    // The value is placed at the head of the linked list.
    public void put(int key, String value) {

        // Check to see if the key is already in the cache.
        ListIterator iter = _map.get(key);

        if (iter != null) {
            // Key already exists, so remove it from the list.
            iter.remove(); // Problem 1: ConcurrentModificationException!
        }

        // Add the new value to the front of the list.
        _list.addFirst(value);
        _map.put(key, _list.listIterator(0));

        _size++;

        // Check if we have exceeded the capacity.
        if (_size > _maxCapacity) {
            // Remove the least recently used item from the tail of the list.
            _list.removeLast();
        }
    }

    // Get the value associated with the key.
    // Move value to the head of the linked list.
    public String get(int key) {

        String result = null;
        ListIterator iter = _map.get(key);

        if (iter != null) {

            //result = iter
            // Problem 2: HOW DO I GET THE STRING FROM THE ITERATOR?

        }

        return result;
    }

    public static void main(String argv[]) throws Exception {
        LRUCache lruCache = new LRUCache(10);

        lruCache.put(10, "This");
        lruCache.put(20, "is");
        lruCache.put(30, "a");
        lruCache.put(40, "test");
        lruCache.put(30, "some"); // Causes ConcurrentModificationException
    }
}

So this leads to three problems:

Problem 1: I am getting a ConcurrentModificationException when I update the LinkedList using the iterator that I store in the HashMap.

Exception in thread "main" java.util.ConcurrentModificationException
    at java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
    at java.util.LinkedList$ListItr.remove(LinkedList.java:919)
    at LRUCache.put(LRUCache.java:31)
    at LRUCache.main(LRUCache.java:71)

Problem 2. How do I retrieve the value pointed to by the ListIterator? It seems I can only retrieve the next() value.

Problem 3. Is there any way to implement this LRU cache using the Java collections LinkedList, or do I really have to implement my own linked list?

Beltz answered 2/4, 2016 at 23:10 Comment(1)

Yeah, there's no way you're going to make that work. You will have to manually reimplement at least one of these data structures if you want to reinvent this wheel. – Abednego 2/4, 2016 at 23:13

G

1

I'll deal with problem 3 first:

As you point out in your question, LinkedList (like all well designed generic collections) hides the details of the implementation such as the nodes containing the links. In your case you need your hash map to reference these links directly as the values of the map. To do otherwise (e.g. having indirection through a third class) would defeat the purpose of an LRU cache to allow very low overhead on value access. But this is not possible with standard Java Collections - they don't (and shouldn't) provide direct access to internal structures.

So the logical conclusion of this is that, yes, you need to implement your own way of storing the order in which items in the cache have been used. That doesn't have to be a double-linked list. Those have traditionally been used for LRU caches because the most common operation is to move a node to the top of the list when it is accessed. That is an incredibly cheap operation in a double-linked list requiring just four nodes to be relinked with no memory allocation or free.

Problem 1 & 2:

Essentially the root cause here is that this you are trying to use iterators as a cursor. They are designed to be created, stepped through to perform some operation and then disposed of. Even if you get over the problems you are having I expect there will be further problems behind them. You're putting a square peg in a round hole.

So my conclusion is that you need to implement your own way to hold values in a class that keeps track of order of access. However it can be incredibly simple: only three operations are required: create, get value and remove from tail. Both create and get value must move the node to the head of the list. No inserting or deleting from the middle of the list. No deleting the head. No searching. Honestly dead simple.

Hopefully this will get you started :-)

public class <K,V> LRU_Map implements Map<K,V> {
    private class Node {
        private final V value;
        private Node previous = null;
        private Node next = null;

        public Node(V value) {
            this.value = value;
            touch();
            if (tail == null)
                tail = this;
        }

        public V getValue() {
            touch();
            return value;
        }

        private void touch() {
            if (head != this) {
                unlink();
                moveToHead();
            }
        }

        private void unlink() {
            if (tail == this)
                tail = prev;
            if (prev != null)
                prev.next = next;
            if (next != null)
                next.prev = prev;
        }

        private void moveToHead() {
            prev = null;
            next = head;
            head = this;
        }

        public void remove() {
            assert this == tail;
            assert this != head;
            assert next == null;
            if (prev != null)
                prev.next = null;
            tail = prev;
        }
    }

    private final Map<K,Node> map = new HashMap<>();
    private Node head = null;
    private Node tail = null;

    public void put(K key, V value) {
        if (map.size() >= MAX_SIZE) {
            assert tail != null;
            tail.remove();
        }
        map.put(key, new Node(value));
    }

    public V get(K key) {
        if (map.containsKey(key))
            return map.get(key).getValue();
        else
            return null;
    }

    // and so on for other Map methods
}

Gautea answered 7/4, 2016 at 4:51 Comment(1)

Thanks. Makes sense now. – Beltz 11/4, 2016 at 18:39

S

2

1) This isn't really what Iterators are for.

By contract, if you modify the list without using the iterator -- as you do here

_list.addFirst(value);

then ALL OPEN ITERATORS on that list should throw ConcurrentModificationException. They were open to a version of the list that no longer exists.

2) A LinkedList is not, exactly, a linked list of nodes. It's a java.util.List, whose backing implementation is a doubly linked list of nodes. That List contract is why it does not expose references to the backing implementation -- so operations like "remove this node, as a node, and move it to the head" are no good. This encapsulation is for your own protection (same as the concurrent mod exception) -- it allows your code to rely on the List semantics of a LinkedList (iterability, for example) without worry that some joker two cubes away was hacking away at its innards and broke the contract.

3) What you really need here is NOT a LinkedList. What you need is a Stack that that allows you to move any arbitrary entry to the head and dump the tail. You are implying that you want a fast seek time to an arbitrary entry and also a fast remove and a fast add, AND you want to be able to find the tail at any moment in case you need to remove it.

Fast seek time == HashSomething

Fast add/remove of arbitrary elements == LinkedSomething

Fast addressing of the final element == SomekindaList

4) You're going to need to build your own linking structure...or use a LinkedHashMap.

PS LinkedHashSet is cheating, it's implemented using a LinkedHashMap.

Spada answered 7/4, 2016 at 2:15 Comment(1)

Thanks. Great explanation. – Beltz 11/4, 2016 at 18:38

G

1

I'll deal with problem 3 first:

As you point out in your question, LinkedList (like all well designed generic collections) hides the details of the implementation such as the nodes containing the links. In your case you need your hash map to reference these links directly as the values of the map. To do otherwise (e.g. having indirection through a third class) would defeat the purpose of an LRU cache to allow very low overhead on value access. But this is not possible with standard Java Collections - they don't (and shouldn't) provide direct access to internal structures.

So the logical conclusion of this is that, yes, you need to implement your own way of storing the order in which items in the cache have been used. That doesn't have to be a double-linked list. Those have traditionally been used for LRU caches because the most common operation is to move a node to the top of the list when it is accessed. That is an incredibly cheap operation in a double-linked list requiring just four nodes to be relinked with no memory allocation or free.

Problem 1 & 2:

Essentially the root cause here is that this you are trying to use iterators as a cursor. They are designed to be created, stepped through to perform some operation and then disposed of. Even if you get over the problems you are having I expect there will be further problems behind them. You're putting a square peg in a round hole.

So my conclusion is that you need to implement your own way to hold values in a class that keeps track of order of access. However it can be incredibly simple: only three operations are required: create, get value and remove from tail. Both create and get value must move the node to the head of the list. No inserting or deleting from the middle of the list. No deleting the head. No searching. Honestly dead simple.

Hopefully this will get you started :-)

public class <K,V> LRU_Map implements Map<K,V> {
    private class Node {
        private final V value;
        private Node previous = null;
        private Node next = null;

        public Node(V value) {
            this.value = value;
            touch();
            if (tail == null)
                tail = this;
        }

        public V getValue() {
            touch();
            return value;
        }

        private void touch() {
            if (head != this) {
                unlink();
                moveToHead();
            }
        }

        private void unlink() {
            if (tail == this)
                tail = prev;
            if (prev != null)
                prev.next = next;
            if (next != null)
                next.prev = prev;
        }

        private void moveToHead() {
            prev = null;
            next = head;
            head = this;
        }

        public void remove() {
            assert this == tail;
            assert this != head;
            assert next == null;
            if (prev != null)
                prev.next = null;
            tail = prev;
        }
    }

    private final Map<K,Node> map = new HashMap<>();
    private Node head = null;
    private Node tail = null;

    public void put(K key, V value) {
        if (map.size() >= MAX_SIZE) {
            assert tail != null;
            tail.remove();
        }
        map.put(key, new Node(value));
    }

    public V get(K key) {
        if (map.containsKey(key))
            return map.get(key).getValue();
        else
            return null;
    }

    // and so on for other Map methods
}

Gautea answered 7/4, 2016 at 4:51 Comment(1)

Thanks. Makes sense now. – Beltz 11/4, 2016 at 18:39

K

0

Another way to skin this cat would be to implement a very simple class that extends the LinkedList, but runs any modifications to the list (e.g. add, remove, etc) inside of a "synchronized" block. You'll need to run your HashMap pseudo-pointer through the get() every time, but it should work just fine. e.g.

...
private Object lock = new Object(); //semaphore

//override LinkedList's implementations...
@Override
public <T> remove(int index) { synchronized(lock) { return super.remove(index); } }
...

If you have Eclipse or IntelliJ IDEA, then you should be able to auto-generate the method stubs you need almost instantly, and you can evaluate which ones need to be locked.

Kisor answered 12/4, 2016 at 17:4 Comment(0)

Recommended topics

Hot tags