Why is getting a value from the end of a LinkedList much slower than from the start?
Asked Answered
M

1

8

I have a LinkedList of 1,000,000 items. I measured the retrieval of an item first at index 100,000 and then at index 900,000. In both cases, the LinkedList goes through 100,000 operations to get to the desired index. So why is the retrieval from the end so much slower than from the start? Measurements taken with JMH.

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
public class ComparationGet {

    static int val1 = 100_000;
    static int val2 = 500_000;
    static int val3 = 900_000;

    @Benchmark
    public void testGet1LinkedListFromStart(Blackhole blackhole, MyState state) {
        MyDigit res1 = state.linkedList.get(val1);
        blackhole.consume(res1);
    }

    @Benchmark
    public void testGet2LinkedListFromEnd(Blackhole blackhole, MyState state) {
        MyDigit res1 = state.linkedList.get(val3);
        blackhole.consume(res3);
    }
}

Results:

from start:
ComparationGet.testGet1LinkedListFromStart  avgt   10  0,457 ± 0,207  ms/op

from end:
ComparationGet.testGet2LinkedListFromEnd  avgt   10  5,789 ± 3,094  ms/op

State class:

@State(Scope.Thread)
public class MyState {
    public List<MyDigit> linkedList;


    private int iterations = 1_000_000;

    @Setup(Level.Invocation)
    public void setUp() {
        linkedList = new LinkedList<>();

        for (int i = 0; i < iterations; i++) {
            linkedList.add(new MyDigit(i));
        }
    }
}

MyDigit class:

public class MyDigit{
    private int val;

    public MyDigit(int val) {
        this.val = val;
    }
}

LinkedList get method:

public E get(int index) {
    checkElementIndex(index);
    return node(index).item;
}

Node<E> node(int index) {
    // assert isElementIndex(index);

    if (index < (size >> 1)) {
        Node<E> x = first;
        for (int i = 0; i < index; i++)
            x = x.next;
        return x;
    } else {
        Node<E> x = last;
        for (int i = size - 1; i > index; i--)
            x = x.prev;
        return x;
    }
}
Mailer answered 10/8, 2020 at 13:2 Comment(11)
Because you start at the beginning and have to link to each element. An ArrayList is much faster for access in this case.Latricialatrina
@Latricialatrina This is wrong. LinkedList has a check which works out whether it's closer to the start or the end and it starts from the optimal side: hg.openjdk.java.net/jdk7/jdk7/jdk/file/9b8c96f96a0f/src/share/…Burnie
@Burnie I don't know what his data looks like, but it looks like it's going from the beginning both times.Latricialatrina
@Latricialatrina That's why he's confused, isn't it? He's aware it starts on the closer end so he expects the times to be close to the same. But his result is showing the end check to be much greater time wise instead of close to the same.Picayune
LinkedList has 1_000_000 elements. testGet2LinkedListFromEnd gets element by index 900_000. In this case LinkedList must go from tail to head.Mailer
I added LinkedList get method in the question. Look at if condition.Mailer
I tested it with Java and measure with System.currentTimeMillis(). Both are equal. Maybe your measurement is not true.Thunderpeal
One thing that occurs to me is that the beginning of the list may have a more predictable memory layout, and incurs fewer cache misses. Where do you create this list? Can you post the full benchmark?Deadman
@Deadman yes, posted. I think in the same direction, but I don't fully understand how it works in memory. If there is a link or could explain, I will be gratefulMailer
In my environment it is faster to get the value from the end than from the beginning.Dual
Great question, and further evidence to support my experience that LinkedList is a useless implementation. It is not as flexible as a hand-rolled linked list because it does not expose direct next/prev pointers between items, and is always slower than ArrayList or ArrayDeque.Godinez
W
7

LinkedList is a fine example of the limitations of fundamental informatics-based reasoning about algorithms. Basic reasoning about the code here, and treating a computer as a simple von neumann model, would dictate that either benchmark needs 100k steps to get from one 'end' to the desired item, and therefore, the benchmark should report equal times, give or take some statistical noise.

In actual fact, one is an order of magnitude slower than the other.

LinkedList is almost always the loser in such issues. In fact, as a rule of thumb, LinkedList should be banned from all codebases. It's almost always vastly slower than basic reasoning would indicate, and in the rare circumstances where LinkedList would (actually, in real benchmarks, not theoretically!) outperform an ArrayList, there's almost always a different type that's even more suitable, such as, say, ArrayDeque.

But, why?

There are many reasons. But usually it has to do with cache paging.

NB: For the CPU design expert: I've oversimplified rather a lot, to try to explain the key aspect (which is that cache misses drown out any algorithmic expectations).

Modern CPUs have hierarchical layers of memory. The slowest, by far, is 'main memory' (that 16GB of RAM or whatnot that you have). The CPU cannot actually read from main memory, at all. And yet O(n) analysis thinks that they can.

Then there's layers of caches, generally 3 (L1 to L3), and even faster than those, registers.

When you read some memory, what actually happens is that the system checks if what you want to read is mapped onto one of the caches, and only entire pages worth of memory can be, so it first checks which page your data is in, and then checks if said page is in one of those caches. If yes, great, the operation succeeds.

If not, uhoh. The CPU can't do your job. So instead, the CPU goes and does something else, or will just twiddle its thumbs for at least 500 cycles (more on faster CPUs!) whilst it evicts some page from one of the caches and copies over from main memory the page you wanted into one of the caches.

Only then can it continue.

Java guarantees that arrays are consecutive. if you declare, say, new int[1000000] java will guarantee that all 1000000 4-byte sequences are all right next to each other, so if you iterate through it, you get the minimum possible 'cache miss' events (where you read from some memory that isn't in one of the caches).

So, if you have an ArrayList, that is, well, backed by an array, so that array is guaranteed consecutive. However, the objects inside don't have to be. Unlike with new int[1000000], with new Object[1000000], you just have the pointers all consecutive; the actual objects they point at need not be.

However, for this test you've set up, that is immaterial, nothing in your code actually 'follows the pointer'.

In LinkedLists, you end up with no array at all, and instead with 2*X (X being the size of the list) objects: Your X objects you are storing, as well as X 'trackers'; each tracker contains a pointer (in java: reference) to the actual object being stored, as well as a 'previous' and 'next' pointer, pointing at its sibling tracker objects.

None of these are guaranteed to be consecutive in memory.

They could be smeared all over. Even just looping through each element in a list of 1000000, not following pointers at all, if the trackers are all over the place that's theoretically worst case scenario 1000000 case misses.

Cache misses are so slow, and CPUs are so fast, that you can safely consider the job of iterating through each tracker (or through each item in a 1000000-sized array) as entirely free, zero CPU time required, as long as you don't run into cache misses: The cache misses tend to dominate the time requirements.

You'd have to investigate further, but here is a plausible explanation for what you're witnessing:

Your code runs in isolation (it is not doing much else); so your init is running unimpeded, and whilst java makes no consecutive guarantees about any of this, your actual memory layout looks like: a MyDigit object, then a linkedlist tracker, then another mydigit object, then another linkedlist tracker, and so on.

Nevertheless, going from the last node involves a number of cache misses, whereas going from the front (which also had the benefit of starting at 'byte 0' of a page) isn't nearly as badly affected.

For reference, here is a chart of access times of fetching a certain sized chunk of data, assuming optimal caching - Note the biiig spike when you get to 4M.

Wagtail answered 10/8, 2020 at 14:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.