Bjarne Stroustrup says we must avoid linked lists
Asked Answered
M

2

17

I saw this video on YouTube: https://www.youtube.com/watch?v=YQs6IC-vgmo in which Bjarne says it is better to use vectors, rather than linked lists. I am unable to grasp the entire thing, so could anyone explain what he is saying in layman's terms?

P.S: I am an high school student and can easily handle linked lists, but I am struggling to learn vectors on my own. Could you suggest any sources to learn vectors?

Muscid answered 9/12, 2015 at 4:10 Comment(21)
Linked lists can be really bad for cache misses. Vectors will store the underlying memory in a continuous block which helps alleviate this issue. So while there might be a lower big O complexity for some operations with linked lists in practice because of the implementation of them on common architectures such as x86 you will find their performance rather lacking sometimes.Picket
"learn vectors on my own" learn to implement one, or learn to use one?Cartogram
A vector/array normally is easier to understand than a linked list. What exactly are you struggling with? ... About the first comment, if you don't know what cache misses are: Where your data is in physical memory, and the order in which you're accessing it, affect the speed of your program. Arrays are often better than lists here, because the way they are.Gadgetry
@Picket If vectors are a continuous block of memory, it becomes hard to insert or delete elements from the middle right? Isn't it just like an improved version of arrays?Muscid
@Cartogram I want to learn to use one.Muscid
@Muscid Right. A vector is an array, just with an easy possibility for resizing it. (an raw array like int arr[10]; has always 10 int, not less/more, and dynamic-sized raw arrays are a bit more complicated)Gadgetry
Vectors are wrappers around arrays. So when you have a vector the underlying storage is an array. There's just a lot of other convenience you get from using them over raw arrays. This includes having resizing handled for you.Picket
@Gadgetry the c++ book given by my high school prof. doesn't include vectors. I have tried online reference materials but none of them explain the concepts from the basics. I don't want to read from books as some of them might teach 'bad practices'. Could you suggest any intermediate level books?Muscid
#388742Cartogram
(second that, see TCs link)Gadgetry
@ShankRam: amazon.com/Coding-Standards-Rules-Guidelines-Practices/dp/…Crossstaff
@Picket is the ability to traverse much faster to reach the required element the only advantage vectors have over linked lists?Muscid
@ShankRam: there are heap memory fragmentation problems and memory overhead per single element. std::list require separate memory chunk for each element equipped with two pointers.Crossstaff
Also could one of you watch the video and explain it to me please?Muscid
I don't say this to be condescending but you might need to do a bit more study before this fully makes sense. Definitely start by getting used to how to use the std::vector container and then try running a few benchmarks against a linked list. If you aren't familiar with how cache works then a lot of this won't be possible to understand on a deeper level.Picket
@MinorThreat could you explain it in layman's terms? I am only a beginner and do not know much about memory allocation.Muscid
@ShankRam: the only advantage std::list have is that it doesn't invalidate iterators after addition/remove operations.Crossstaff
@Picket yes. I definitely will study more about vectors and how to implement them and come back to this post. Thank you all for the help.Muscid
@ShankRam: The heap memory, roughly speaking, works like file system on HDD. There are free and allocated clusters. After some creation and removal of "files", or chunks of data in our case, there will be lots of holes. std::vector creates and maintains a large "file" which reside in continuous cluster's range. std::list creates lots of small files scattered across the hard drive, each with 16+bytes long header.Crossstaff
@MinorThreat Yes that is much better and easy to understandMuscid
The thing to take away is that no container is perfect, they all have their pluses and minuses and for best performance you should always analyse your requirements closely before picking one.Cuisse
C
32

Advantages of vector vs. linked list

The main advantage of vector versus linked lists is memory locality.

Usually, each element is allocated seperately in a linked list. As a consequence those elements probably are not next to each other in memory. (Gaps between the elements in memory.)

A vector is guaranteed to store all contained elements contiguously. (Items next to each other, no gaps;)

Note: Oversimplifications may occur... ;)

Imo, the simplified key points about the superior performance of a contiguously stored data storage pattern versus non-contiguous storage are

1. cache misses

Modern CPUs do not fetch single bytes from memory but slightly larger chunks. Thus, if your data objects size is less than the size of those chunks and if storage is contiguous, you can get more than one element at a time because multiple elements may be in a single chunk.

Example:

A 64byte block (usual cache line size) fits sixteen 32bit integers at a time. Therefore, a cache miss (data not in cache yet -> load from main memory required) occurs at the earliest after processing 16 elements from the moment first one has been brought to cache. If a linked list is used, the first element might well be the only one within a 64byte block. It might in theory happen that a cache miss occurs for every single element of the list.

Concrete example:

std::vector<std::uint32_t> v;
// somehow fill 64 values into v
std::uint32_t s{};
for(std::size_t i{0}; i<v.size(); ++i)
{
  s += v[i];
}

Imagine the contents of v not being cached yet.

What happens during the processing of the data in the for loop?

1)Check whether element v[0] is in cache. --> Nope

2)Fetch 64 bytes starting at the address of v[0] from main memory into a cache line

3)Load v[0] from cache and process by adding its value to s

4)Is element v1 in cache? --> Yes loaded with previous fetch because neighbouring v[0]

5)Load v1 from cache and process by adding its value to s

6)Is element v[2] in cache? --> Yes ...

7) Load v[2] from cache and process by adding its value to s

... etc...

34)Is element v[16] in cache? --> Nope

35) Fetch 64 bytes starting at the address of v[16] from main memory into a cache line

36)Load v[16] from cache and process by adding its value to s

37)Is element v[17] in cache? --> Yes loaded with previous fetch because neighbouring v[16]

etc...

Fetching data from main memory into the cache takes much more time than loading data from cache into processor registers and perform simple operations. Therefore, the fact that multiple values may reside on a single cache line can boost performance significantly.

Linked lists do not provide a contiguous storage guarantee and you cannot hope to get this performance boost. This is also the reason why random iteration (accessing elements randomly) performs worse than forward iteration (accessing elements in order) for contiguous containers.

2. prefetching

The above effect is amplified by a CPU feature called "prefetcher".

If a chunk has been loaded from main memory, the prefetcher prepares loading the next chunk / already puts it into cache, significantly reducing the penality of loading stuff from that part of the main memory.

This is of course effective if and only if you in fact need data from the next prepared chunk.

How do vectors usually work (internally)?

See: c++ Vector, what happens whenever it expands/reallocate on stack?

Coaction answered 9/12, 2015 at 5:19 Comment(0)
H
1

Stroustrup has written an article https://isocpp.org/blog/2014/06/stroustrup-lists that says he has been misunderstood and isn't saying to avoid linked lists.

I can't comment on C++ implementations of vectors and linked lists.. But, you say you understand linked lists and not vectors. I can say that in Java, people understood vectors but not necessarily linked lists. C# has a List data type, and most people don't really look into whether that's implemented as a linked list or as a vector. Here is a good discussion on the differences in terms of usage. https://www.reddit.com/r/learnprogramming/comments/15mxrt/what_are_the_real_world_applications_of_linked/ One comment says "Data stored in a Linked List, once allocated in memory, will stay in the same spot. That means as your linked list changes in size, the data of any elements does not move (in memory), so it can be safely pointed at."

Straustrup's article says "And, yes, my recomendation is to use std::vector by default. More generally, use a contiguous representation unless there is a good reason not to. Like C, C++ is designed to do that by default. Also, please don’t make statements about performance without measurements. "

I have seen Stroustrup in an interview say that deleting an element and shifting them all up is really fast, and faster than deleting an element from a linked list. But I suppose one shouldn't conclude from that that he's saying linked lists have no use case.

Hendiadys answered 11/9, 2022 at 23:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.