How does C# dynamically allocate memory for a List<T>?
Asked Answered
T

3

18

From LukeH's answer to what is the max limit of data into list<string> in c#?

The maximum number of elements that can be stored in the current implementation of List is, theoretically, Int32.MaxValue - just over 2 billion.

we see that a List can carry a large amount of items. I'm assuming that the compiler doesn't just free up the space for 2 billion times size of T for every new implementation of List<T>, so how does the list dynamically grow? Does it have pointers to noncontiguous spaces in memory?

Transarctic answered 7/7, 2018 at 22:58 Comment(3)
@CamiloTerevinto Thank you Camilo, I don't know why I always fail in thinking about looking at the source.Transarctic
This may be a basic question, but it's not a dupe of the referenced ones. Memory allocation for collections in .NET talks about stack vs heap, and makes no mention of dynamic growth.Imine
Nothing wrong with the question, and you got good answers, but I still want to point out that in principle we don't know, since it's an implementation detail. It can be different in various platforms and versions.Levesque
I
48

The List<T> class is implemented to use an internal T[] array under the hood. If you initialize it using the List<T>(int) constructor, it will allocate an array of the specified size. If you use the default constructor, it will go for the default capacity of 4, but in this case, the array would only get allocated on the first addition.

Each time you add an element to the list, it will first check whether the capacity has been reached (i.e. whether the existing Count equals Capacity). If so, it will create a fresh array of twice the size as the previous one, copy over all existing elements into it, and then proceed with writing the new element. This will keep happening indefinitely on subsequent element additions, until the hard limit you referenced (Int32.MaxValue) is reached.

Performance-wise, this means that the addition of an element is either an O(1) or an O(n) operation, depending on whether the capacity needs to be increased (as discussed under Add). However, since the capacity is doubled whenever it needs to increase, this reallocation happens with exponentially decreasing frequency as the list grows larger. For example, starting from 4, the capacity increases would happen at 4, 8, 16, 32, 64, 128, … elements. Thus, the total cost of the reallocations when calling Add n times would be roughly 4 + 8 + 16 + … + n/8 + n/4 + n/2, which still corresponds to O(n).

Here's an example showing the state of the internal array along a sequence of addition operations:

                               //   ┌┐
var list = new List<char>();   //   ││   Count:    0
                               //   └┘   Capacity: 0
                               //   ┌───┬───┬───┬───┐
list.Add('h');                 //   │ h │ ░ │ ░ │ ░ │   Count:    1
                               //   └───┴───┴───┴───┘   Capacity: 4
                               //   ┌───┬───┬───┬───┐
list.Add('e');                 //   │ h │ e │ ░ │ ░ │   Count:    2
                               //   └───┴───┴───┴───┘   Capacity: 4
                               //   ┌───┬───┬───┬───┐
list.Add('l');                 //   │ h │ e │ l │ ░ │   Count:    3
                               //   └───┴───┴───┴───┘   Capacity: 4
                               //   ┌───┬───┬───┬───┐
list.Add('l');                 //   │ h │ e │ l │ l │   Count:    4
                               //   └───┴───┴───┴───┘   Capacity: 4
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┐
list.Add('o');                 //   │ h │ e │ l │ l │ o │ ░ │ ░ │ ░ │   Count:    5
                               //   └───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 8
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┐
list.Add(' ');                 //   │ h │ e │ l │ l │ o │   │ ░ │ ░ │   Count:    6
                               //   └───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 8
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┐
list.Add('w');                 //   │ h │ e │ l │ l │ o │   │ w │ ░ │   Count:    7
                               //   └───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 8
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┐
list.Add('o');                 //   │ h │ e │ l │ l │ o │   │ w │ o │   Count:    8
                               //   └───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 8
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
list.Add('r');                 //   │ h │ e │ l │ l │ o │   │ w │ o │ r │ ░ │ ░ │ ░ │ ░ │ ░ │ ░ │ ░ │   Count:    9
                               //   └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 16
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
list.Add('l');                 //   │ h │ e │ l │ l │ o │   │ w │ o │ r │ ░ │ ░ │ ░ │ ░ │ ░ │ ░ │ ░ │   Count:    10
                               //   └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 16
                               //   ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
list.Add('d');                 //   │ h │ e │ l │ l │ o │   │ w │ o │ r │ l │ d │ ░ │ ░ │ ░ │ ░ │ ░ │   Count:    11
                               //   └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘   Capacity: 16

The symbol represents allocated space that is still unused. Those array locations would contain the default value for T. In the case of char, this will be the null character, \0. However, these values would never be visible to the consumer.

When adding multiple elements together through AddRange, only one reallocation is performed at most. If doubling the previous capacity would be insufficient to accommodate all the new elements, then the internal array is increased immediately to the new count instead.

Unlike addition, removing elements doesn't automatically shrink the list. However, you can cause this to happen manually by calling TrimExcess.

As mentioned in the comments, some aspects of the above (such as the default initial capacity of 4) are implementation details derived from the source code for .NET Framework 4.7.2. However, the core principles are well-entrenched and unlikely to change in other/future frameworks.

Imine answered 7/7, 2018 at 23:32 Comment(1)
Excellent. would you please introduce a book or document about this?Pyxie
G
5

Your assumption is correct, the compiler doesn't allocate anything. The List<T> class internally uses an array to store the elements, and it checks whether the size of the array is enough on every call to Add, as you can see in the source code:

public void Add(T item) {
    if (_size == _items.Length) EnsureCapacity(_size + 1);
    _items[_size++] = item;
    _version++;
}

private void EnsureCapacity(int min) {
    if (_items.Length < min) {
        int newCapacity = _items.Length == 0? _defaultCapacity : _items.Length * 2;
        // Allow the list to grow to maximum possible capacity (~2G elements) before encountering overflow.
        // Note that this check works even when _items.Length overflowed thanks to the (uint) cast
        if ((uint)newCapacity > Array.MaxArrayLength) newCapacity = Array.MaxArrayLength;
        if (newCapacity < min) newCapacity = min;
        Capacity = newCapacity;
    }
}
Granadilla answered 7/7, 2018 at 23:19 Comment(0)
A
2

The source code would be definitive, as in @CamiloTerevinto's answer, as to how this is specifically accomplished, but the documentation covers this, too.

The Remarks section of the List<> class states:

The List class is the generic equivalent of the ArrayList class. It implements the IList generic interface by using an array whose size is dynamically increased as required.

The Remarks section of the Capacity property elaborates:

Capacity is the number of elements that the List can store before resizing is required, whereas Count is the number of elements that are actually in the List.

Capacity is always greater than or equal to Count. If Count exceeds Capacity while adding elements, the capacity is increased by automatically reallocating the internal array before copying the old elements and adding the new elements.

Altimeter answered 7/7, 2018 at 23:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.