When is it better to use a List vs a LinkedList?
Edit
Please read the comments to this answer. People claim I did not do proper tests. I agree this should not be an accepted answer. As I was learning I did some tests and felt like sharing them.
Original answer...
I found interesting results:
// Temporary class to show the example
class Temp
{
public decimal A, B, C, D;
public Temp(decimal a, decimal b, decimal c, decimal d)
{
A = a; B = b; C = c; D = d;
}
}
Linked list (3.9 seconds)
LinkedList<Temp> list = new LinkedList<Temp>();
for (var i = 0; i < 12345678; i++)
{
var a = new Temp(i, i, i, i);
list.AddLast(a);
}
decimal sum = 0;
foreach (var item in list)
sum += item.A;
List (2.4 seconds)
List<Temp> list = new List<Temp>(); // 2.4 seconds
for (var i = 0; i < 12345678; i++)
{
var a = new Temp(i, i, i, i);
list.Add(a);
}
decimal sum = 0;
foreach (var item in list)
sum += item.A;
Even if you only access data essentially it is much slower!! I say never use a linkedList.
Here is another comparison performing a lot of inserts (we plan on inserting an item at the middle of the list)
Linked List (51 seconds)
LinkedList<Temp> list = new LinkedList<Temp>();
for (var i = 0; i < 123456; i++)
{
var a = new Temp(i, i, i, i);
list.AddLast(a);
var curNode = list.First;
for (var k = 0; k < i/2; k++) // In order to insert a node at the middle of the list we need to find it
curNode = curNode.Next;
list.AddAfter(curNode, a); // Insert it after
}
decimal sum = 0;
foreach (var item in list)
sum += item.A;
List (7.26 seconds)
List<Temp> list = new List<Temp>();
for (var i = 0; i < 123456; i++)
{
var a = new Temp(i, i, i, i);
list.Insert(i / 2, a);
}
decimal sum = 0;
foreach (var item in list)
sum += item.A;
Linked List having reference of location where to insert (.04 seconds)
list.AddLast(new Temp(1,1,1,1));
var referenceNode = list.First;
for (var i = 0; i < 123456; i++)
{
var a = new Temp(i, i, i, i);
list.AddLast(a);
list.AddBefore(referenceNode, a);
}
decimal sum = 0;
foreach (var item in list)
sum += item.A;
So only if you plan on inserting several items and you also somewhere have the reference of where you plan to insert the item then use a linked list. Just because you have to insert a lot of items it does not make it faster because searching the location where you will like to insert it takes time.
list.AddLast(a);
in the last two LinkedList examples? I get doing it once before the loop, as with list.AddLast(new Temp(1,1,1,1));
in the next to last LinkedList, but it looks (to me) like you're adding twice as many Temp objects in the loops themselves. (And when I double-check myself with a test app, sure enough, twice as many in the LinkedList.) –
Trenchant I say never use a linkedList.
is flawed as your later post reveals. You might want to edit it. 2) What are you timing? Instantiation, addition, and enumeration altogether in one step? Mostly, instantiation and enumeration are not what ppl are worried about, those are one time steps. Specifically timing the inserts and additions would give a better idea. 3) Most importantly, you're adding more than required to a linkedlist. This is a wrong comparison. Spreads wrong idea about linkedlist. –
Iotacism In most cases, List<T>
is more useful. LinkedList<T>
will have less cost when adding/removing items in the middle of the list, whereas List<T>
can only cheaply add/remove at the end of the list.
LinkedList<T>
is only at it's most efficient if you are accessing sequential data (either forwards or backwards) - random access is relatively expensive since it must walk the chain each time (hence why it doesn't have an indexer). However, because a List<T>
is essentially just an array (with a wrapper) random access is fine.
List<T>
also offers a lot of support methods - Find
, ToArray
, etc; however, these are also available for LinkedList<T>
with .NET 3.5/C# 3.0 via extension methods - so that is less of a factor.
List<T>
and T[]
will fail for being too chunky (all one slab), LinkedList<T>
will wail for being too granular (slab per element). –
Despotism Array
succeeds in allocating, it will be fully sequential; the allocation done in Resize
is new T[newSize]
. array.cs. List
internally an Array
, so same is true for it. LinkedList
is not what you want when you have a large number of elements: it allocates a LinkedListNode
per element. Each LinkedListNode
is a separate allocation: there is not an Array
of anything. No continuity. A lot of memory used for previous/next pointers. google c# LinkedList source code
. –
Annikaanniken Thinking of a linked list as a list can be a bit misleading. It's more like a chain. In fact, in .NET, LinkedList<T>
does not even implement IList<T>
. There is no real concept of index in a linked list, even though it may seem there is. Certainly none of the methods provided on the class accept indexes.
Linked lists may be singly linked, or doubly linked. This refers to whether each element in the chain has a link only to the next one (singly linked) or to both the prior/next elements (doubly linked). LinkedList<T>
is doubly linked.
Internally, List<T>
is backed by an array. This provides a very compact representation in memory. Conversely, LinkedList<T>
involves additional memory to store the bidirectional links between successive elements. So the memory footprint of a LinkedList<T>
will generally be larger than for List<T>
(with the caveat that List<T>
can have unused internal array elements to improve performance during append operations.)
They have different performance characteristics too:
Append
LinkedList<T>.AddLast(item)
constant timeList<T>.Add(item)
amortized constant time, linear worst case
Prepend
LinkedList<T>.AddFirst(item)
constant timeList<T>.Insert(0, item)
linear time
Insertion
LinkedList<T>.AddBefore(node, item)
constant timeLinkedList<T>.AddAfter(node, item)
constant timeList<T>.Insert(index, item)
linear time
Removal
LinkedList<T>.Remove(item)
linear timeLinkedList<T>.Remove(node)
constant timeList<T>.Remove(item)
linear timeList<T>.RemoveAt(index)
linear time
Count
LinkedList<T>.Count
constant timeList<T>.Count
constant time
Contains
LinkedList<T>.Contains(item)
linear timeList<T>.Contains(item)
linear time
Clear
LinkedList<T>.Clear()
linear timeList<T>.Clear()
linear time
As you can see, they're mostly equivalent. In practice, the API of LinkedList<T>
is more cumbersome to use, and details of its internal needs spill out into your code.
However, if you need to do many insertions/removals from within a list, it offers constant time. List<T>
offers linear time, as extra items in the list must be shuffled around after the insertion/removal.
LinkedList<T>
doesn't have any member for index-based access. You can still do this using an extension method based on IEnumerable<T>
which of course suggests linear time access. –
Antonantone Clear
. Both are O(n). The memory overhead for LinkedList is worth noting. Already upvoted. –
Iotacism Clear
. The third paragraph already discusses memory usage. Would you add to it? –
Antonantone Clear
is linear time, not constant time for both methods. It's documented in msdn. –
Iotacism List
, and write your algorithm as simply as you can, with no concern for performance. After your code works correctly in every test case you can think of, time it. If fast enough, move on to another task. –
Annikaanniken List
(if the underlying array is still large enough) should be fast enough. Inserting at the beginning is of course something else. –
Foltz Linked lists provide very fast insertion or deletion of a list member. Each member in a linked list contains a pointer to the next member in the list so to insert a member at position i:
- update the pointer in member i-1 to point to the new member
- set the pointer in the new member to point to member i
The disadvantage to a linked list is that random access is not possible. Accessing a member requires traversing the list until the desired member is found.
LinkedList
only helps when you have a reference to the element to insert before or after. if you have to search for the element, then the time will be dominated by that linear search. If each element is associated with a unique key, then a Dictionary
mapping key to element will give you the element in O(1) time. (E.g. if each element has an int Id
, and you are passing around those Id
s rather than passing around references to elements, you'll want this Dictionary
.) –
Annikaanniken Edit
Please read the comments to this answer. People claim I did not do proper tests. I agree this should not be an accepted answer. As I was learning I did some tests and felt like sharing them.
Original answer...
I found interesting results:
// Temporary class to show the example
class Temp
{
public decimal A, B, C, D;
public Temp(decimal a, decimal b, decimal c, decimal d)
{
A = a; B = b; C = c; D = d;
}
}
Linked list (3.9 seconds)
LinkedList<Temp> list = new LinkedList<Temp>();
for (var i = 0; i < 12345678; i++)
{
var a = new Temp(i, i, i, i);
list.AddLast(a);
}
decimal sum = 0;
foreach (var item in list)
sum += item.A;
List (2.4 seconds)
List<Temp> list = new List<Temp>(); // 2.4 seconds
for (var i = 0; i < 12345678; i++)
{
var a = new Temp(i, i, i, i);
list.Add(a);
}
decimal sum = 0;
foreach (var item in list)
sum += item.A;
Even if you only access data essentially it is much slower!! I say never use a linkedList.
Here is another comparison performing a lot of inserts (we plan on inserting an item at the middle of the list)
Linked List (51 seconds)
LinkedList<Temp> list = new LinkedList<Temp>();
for (var i = 0; i < 123456; i++)
{
var a = new Temp(i, i, i, i);
list.AddLast(a);
var curNode = list.First;
for (var k = 0; k < i/2; k++) // In order to insert a node at the middle of the list we need to find it
curNode = curNode.Next;
list.AddAfter(curNode, a); // Insert it after
}
decimal sum = 0;
foreach (var item in list)
sum += item.A;
List (7.26 seconds)
List<Temp> list = new List<Temp>();
for (var i = 0; i < 123456; i++)
{
var a = new Temp(i, i, i, i);
list.Insert(i / 2, a);
}
decimal sum = 0;
foreach (var item in list)
sum += item.A;
Linked List having reference of location where to insert (.04 seconds)
list.AddLast(new Temp(1,1,1,1));
var referenceNode = list.First;
for (var i = 0; i < 123456; i++)
{
var a = new Temp(i, i, i, i);
list.AddLast(a);
list.AddBefore(referenceNode, a);
}
decimal sum = 0;
foreach (var item in list)
sum += item.A;
So only if you plan on inserting several items and you also somewhere have the reference of where you plan to insert the item then use a linked list. Just because you have to insert a lot of items it does not make it faster because searching the location where you will like to insert it takes time.
list.AddLast(a);
in the last two LinkedList examples? I get doing it once before the loop, as with list.AddLast(new Temp(1,1,1,1));
in the next to last LinkedList, but it looks (to me) like you're adding twice as many Temp objects in the loops themselves. (And when I double-check myself with a test app, sure enough, twice as many in the LinkedList.) –
Trenchant I say never use a linkedList.
is flawed as your later post reveals. You might want to edit it. 2) What are you timing? Instantiation, addition, and enumeration altogether in one step? Mostly, instantiation and enumeration are not what ppl are worried about, those are one time steps. Specifically timing the inserts and additions would give a better idea. 3) Most importantly, you're adding more than required to a linkedlist. This is a wrong comparison. Spreads wrong idea about linkedlist. –
Iotacism My previous answer was not enough accurate. As truly it was horrible :D But now I can post much more useful and correct answer.
I did some additional tests. You can find it's source by the following link and reCheck it on your environment by your own: https://github.com/ukushu/DataStructuresTestsAndOther.git
Short results:
Array need to use:
- So often as possible. It's fast and takes smallest RAM range for same amount information.
- If you know exact count of cells needed
- If data saved in array < 85000 b (85000/32 = 2656 elements for integer data)
- If needed high Random Access speed
List need to use:
- If needed to add cells to the end of list (often)
- If needed to add cells in the beginning/middle of the list (NOT OFTEN)
- If data saved in array < 85000 b (85000/32 = 2656 elements for integer data)
- If needed high Random Access speed
LinkedList need to use:
- If needed to add cells in the beginning/middle/end of the list (often)
- If needed only sequential access (forward/backward)
- If you need to save LARGE items, but items count is low.
- Better do not use for large amount of items, as it's use additional memory for links.
More details:
LinkedList<T>
internally is not a List in .NET. It's even does not implementIList<T>
. And that's why there are absent indexes and methods related to indexes.LinkedList<T>
is node-pointer based collection. In .NET it's in doubly linked implementation. This means that prior/next elements have link to current element. And data is fragmented -- different list objects can be located in different places of RAM. Also there will be more memory used forLinkedList<T>
than forList<T>
or Array.List<T>
in .Net is Java's alternative ofArrayList<T>
. This means that this is array wrapper. So it's allocated in memory as one contiguous block of data. If allocated data size exceeds 85000 bytes, it will be moved to Large Object Heap. Depending on the size, this can lead to heap fragmentation(a mild form of memory leak). But in the same time if size < 85000 bytes -- this provides a very compact and fast-access representation in memory.Single contiguous block is preferred for random access performance and memory consumption but for collections that need to change size regularly a structure such as an Array generally need to be copied to a new location whereas a linked list only needs to manage the memory for the newly inserted/deleted nodes.
The difference between List and LinkedList lies in their underlying implementation. List is array based collection (ArrayList). LinkedList is node-pointer based collection (LinkedListNode). On the API level usage, both of them are pretty much the same since both implement same set of interfaces such as ICollection, IEnumerable, etc.
The key difference comes when performance matter. For example, if you are implementing the list that has heavy "INSERT" operation, LinkedList outperforms List. Since LinkedList can do it in O(1) time, but List may need to expand the size of underlying array. For more information/detail you might want to read up on the algorithmic difference between LinkedList and array data structures. http://en.wikipedia.org/wiki/Linked_list and Array
Hope this help,
Add
is always at end of existing array. List
is "good enough" at that, even if not O(1). The serious problem occurs if you need many Add
s that are not at the end. Marc is pointing out that the need to move existing data every time you insert (not just when resize is needed) is a more substantial performance cost of List
. –
Annikaanniken The primary advantage of linked lists over arrays is that the links provide us with the capability to rearrange the items efficiently. Sedgewick, p. 91
A common circumstance to use LinkedList is like this:
Suppose you want to remove many certain strings from a list of strings with a large size, say 100,000. The strings to remove can be looked up in HashSet dic, and the list of strings is believed to contain between 30,000 to 60,000 such strings to remove.
Then what's the best type of List for storing the 100,000 Strings? The answer is LinkedList. If the they are stored in an ArrayList, then iterating over it and removing matched Strings whould take up to billions of operations, while it takes just around 100,000 operations by using an iterator and the remove() method.
LinkedList<String> strings = readStrings();
HashSet<String> dic = readDic();
Iterator<String> iterator = strings.iterator();
while (iterator.hasNext()){
String string = iterator.next();
if (dic.contains(string))
iterator.remove();
}
RemoveAll
to remove the items from a List
without moving a lot of items around, or use Where
from LINQ to create a second list. Using a LinkedList
here however ends up consuming dramatically more memory than other types of collections and the loss of memory locality means that it will be noticeably slower to iterate, making it quite a bit worse than a List
. –
Alienism RemoveAll
equivalent in Java. –
Trice RemoveAll
is not available for List
, you could do a "compaction" algorithm, which would look like Tom's loop, but with two indices and the need to move items to be kept one at a time down in the list's internal array. Efficiency is O(n), the same as Tom's algorithm for LinkedList
. In both versions, the time to compute the HashSet key for the strings dominates. This is not a good example of when to use LinkedList
. –
Annikaanniken When you need built-in indexed access, sorting (and after this binary searching), and "ToArray()" method, you should use List.
Essentially, a List<>
in .NET is a wrapper over an array. A LinkedList<>
is a linked list. So the question comes down to, what is the difference between an array and a linked list, and when should an array be used instead of a linked list. Probably the two most important factors in your decision of which to use would come down to:
- Linked lists have much better insertion/removal performance, so long as the insertions/removals are not on the last element in the collection. This is because an array must shift all remaining elements that come after the insertion/removal point. If the insertion/removal is at the tail end of the list however, this shift is not needed (although the array may need to be resized, if its capacity is exceeded).
- Arrays have much better accessing capabilities. Arrays can be indexed into directly (in constant time). Linked lists must be traversed (linear time).
This is adapted from Tono Nam's accepted answer correcting a few wrong measurements in it.
The test:
static void Main()
{
LinkedListPerformance.AddFirst_List(); // 12028 ms
LinkedListPerformance.AddFirst_LinkedList(); // 33 ms
LinkedListPerformance.AddLast_List(); // 33 ms
LinkedListPerformance.AddLast_LinkedList(); // 32 ms
LinkedListPerformance.Enumerate_List(); // 1.08 ms
LinkedListPerformance.Enumerate_LinkedList(); // 3.4 ms
//I tried below as fun exercise - not very meaningful, see code
//sort of equivalent to insertion when having the reference to middle node
LinkedListPerformance.AddMiddle_List(); // 5724 ms
LinkedListPerformance.AddMiddle_LinkedList1(); // 36 ms
LinkedListPerformance.AddMiddle_LinkedList2(); // 32 ms
LinkedListPerformance.AddMiddle_LinkedList3(); // 454 ms
Environment.Exit(-1);
}
And the code:
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace stackoverflow
{
static class LinkedListPerformance
{
class Temp
{
public decimal A, B, C, D;
public Temp(decimal a, decimal b, decimal c, decimal d)
{
A = a; B = b; C = c; D = d;
}
}
static readonly int start = 0;
static readonly int end = 123456;
static readonly IEnumerable<Temp> query = Enumerable.Range(start, end - start).Select(temp);
static Temp temp(int i)
{
return new Temp(i, i, i, i);
}
static void StopAndPrint(this Stopwatch watch)
{
watch.Stop();
Console.WriteLine(watch.Elapsed.TotalMilliseconds);
}
public static void AddFirst_List()
{
var list = new List<Temp>();
var watch = Stopwatch.StartNew();
for (var i = start; i < end; i++)
list.Insert(0, temp(i));
watch.StopAndPrint();
}
public static void AddFirst_LinkedList()
{
var list = new LinkedList<Temp>();
var watch = Stopwatch.StartNew();
for (int i = start; i < end; i++)
list.AddFirst(temp(i));
watch.StopAndPrint();
}
public static void AddLast_List()
{
var list = new List<Temp>();
var watch = Stopwatch.StartNew();
for (var i = start; i < end; i++)
list.Add(temp(i));
watch.StopAndPrint();
}
public static void AddLast_LinkedList()
{
var list = new LinkedList<Temp>();
var watch = Stopwatch.StartNew();
for (int i = start; i < end; i++)
list.AddLast(temp(i));
watch.StopAndPrint();
}
public static void Enumerate_List()
{
var list = new List<Temp>(query);
var watch = Stopwatch.StartNew();
foreach (var item in list)
{
}
watch.StopAndPrint();
}
public static void Enumerate_LinkedList()
{
var list = new LinkedList<Temp>(query);
var watch = Stopwatch.StartNew();
foreach (var item in list)
{
}
watch.StopAndPrint();
}
//for the fun of it, I tried to time inserting to the middle of
//linked list - this is by no means a realistic scenario! or may be
//these make sense if you assume you have the reference to middle node
//insertion to the middle of list
public static void AddMiddle_List()
{
var list = new List<Temp>();
var watch = Stopwatch.StartNew();
for (var i = start; i < end; i++)
list.Insert(list.Count / 2, temp(i));
watch.StopAndPrint();
}
//insertion in linked list in such a fashion that
//it has the same effect as inserting into the middle of list
public static void AddMiddle_LinkedList1()
{
var list = new LinkedList<Temp>();
var watch = Stopwatch.StartNew();
LinkedListNode<Temp> evenNode = null, oddNode = null;
for (int i = start; i < end; i++)
{
if (list.Count == 0)
oddNode = evenNode = list.AddLast(temp(i));
else
if (list.Count % 2 == 1)
oddNode = list.AddBefore(evenNode, temp(i));
else
evenNode = list.AddAfter(oddNode, temp(i));
}
watch.StopAndPrint();
}
//another hacky way
public static void AddMiddle_LinkedList2()
{
var list = new LinkedList<Temp>();
var watch = Stopwatch.StartNew();
for (var i = start + 1; i < end; i += 2)
list.AddLast(temp(i));
for (int i = end - 2; i >= 0; i -= 2)
list.AddLast(temp(i));
watch.StopAndPrint();
}
//OP's original more sensible approach, but I tried to filter out
//the intermediate iteration cost in finding the middle node.
public static void AddMiddle_LinkedList3()
{
var list = new LinkedList<Temp>();
var watch = Stopwatch.StartNew();
for (var i = start; i < end; i++)
{
if (list.Count == 0)
list.AddLast(temp(i));
else
{
watch.Stop();
var curNode = list.First;
for (var j = 0; j < list.Count / 2; j++)
curNode = curNode.Next;
watch.Start();
list.AddBefore(curNode, temp(i));
}
}
watch.StopAndPrint();
}
}
}
You can see the results are in accordance with theoretical performance others have documented here. Quite clear - LinkedList<T>
gains big time in case of insertions. I haven't tested for removal from the middle of list, but the result should be the same. Of course List<T>
has other areas where it performs way better like O(1) random access.
Use LinkedList<>
when
- You don't know how many objects are coming through the flood gate. For example,
Token Stream
. - When you ONLY wanted to delete\insert at the ends.
For everything else, it is better to use List<>
.
LinkedListNode<T>
objects in your code. If you can do that, then it's much better than using List<T>
, especially for very long lists where inserts/removals are frequent. –
Antonantone node.Value
whenever you want the original element). So you rewrite algorithm to work with nodes, not raw values. –
Annikaanniken LinkedList
, when there might be many objects, and objects are small, is that each object requires an extra allocation - the LinkedListNode
that is created when you Add it to the LinkedList. The upside is that you don't have to acquire a single contiguous chunk of memory, so you avoid a potential memory fragmentation issue. (I've never found the performance cost of List
's internal Resize to be significant; its how memory is used that matters in practice.) –
Annikaanniken I do agree with most of the point made above. And I also agree that List looks like a more obvious choice in most of the cases.
But, I just want to add that there are many instance where LinkedList are far better choice than List for better efficiency.
- Suppose you are traversing through the elements and you want to perform lot of insertions/deletion; LinkedList does it in linear O(n) time, whereas List does it in quadratic O(n^2) time.
- Suppose you want to access bigger objects again and again, LinkedList become very more useful.
- Deque() and queue() are better implemented using LinkedList.
- Increasing the size of LinkedList is much easier and better once you are dealing with many and bigger objects.
Hope someone would find these comments useful.
Additionally to other answers, I find LinkedList<T>
is better when it comes to large data sets and memory allocation.
While List<T>
needs contiguous space, LinkedList<T>
does not.
This is extremely helpful when you have Data in the size of GBs.
Because the single Node entries can fit wherever there is enough space.
With a regular List<T>
or T[]
you need GBs of contiguous space in this case.
Imagine you have 2,000,000 items in a List<T>
:
When you insert into it and it needs to reallocate (because Capacity
is full), you need contiguous space for 4,000,000 items (2.0x grow factor assumed). Actually even more because the new array and old array need to exist at the same time for the reallocation.
With a LinkedList<T>
the items are scattered around in memory wherever they fit, and adding an item is nothing more than allocating memory for a single T
and the Node's data (+ whatever other things the Runtime needs)
List<int>
with 2^10 elements will consume 1M * sizeof(int) == 1M * 4 bytes -> a single 4 MB chunk of contiguous memory (the array) plus minimal overhead: 8 bytes for the reference to the array + 4 bytes for the internal count
. A LinkedList<int>
will allocate 1M LinkedListNode<int>
objects! Each has 3 references (original list, prev node, next node) plus the int
payload: 3*8 + 4 = 28 bytes per node. Total: 28 * 1M = 28 MB! (vs 4MB for List<>) –
Crocodile In .NET, Lists are represented as Arrays. Therefore using a normal List would be quite faster in comparison to LinkedList.That is why people above see the results they see.
Why should you use the List? I would say it depends. List creates 4 elements if you don't have any specified. The moment you exceed this limit, it copies stuff to a new array, leaving the old one in the hands of the garbage collector. It then doubles the size. In this case, it creates a new array with 8 elements. Imagine having a list with 1 million elements, and you add 1 more. It will essentially create a whole new array with double the size you need. The new array would be with 2Mil capacity however, you only needed 1Mil and 1. Essentially leaving stuff behind in GEN2 for the garbage collector and so on. So it can actually end up being a huge bottleneck. You should be careful about that.
I asked a similar question related to performance of the LinkedList collection, and discovered Steven Cleary's C# implement of Deque was a solution. Unlike the Queue collection, Deque allows moving items on/off front and back. It is similar to linked list, but with improved performance.
Deque
is "similar to linked list, but with improved performance". Please qualify that statement: Deque
is better performance than LinkedList
, for your specific code. Following your link, I see that two days later you learned from Ivan Stoev that this was not an inefficiency of LinkedList, but an inefficiency in your code. (And even if it had been an inefficiency of LinkedList, that would not justify a general statement that Deque is more efficient; only in specific cases.) –
Annikaanniken © 2022 - 2024 — McMap. All rights reserved.