Is it worthwhile to initialize the collection size of a List<T> if it's size reasonably known?
Asked Answered
D

3

41

Is it worthwhile to initialize the collection size of a List<T> if it's reasonably known?

Edit: Furthering this question, after reading the first answers this question really boils down to what is the default capacity and how is the growth operation performed, does it double the capacity etc.?

Dialogism answered 11/2, 2010 at 21:18 Comment(4)
Just by the fact that you said "reasonably" and not "definitely", I'd say no.Deletion
On what measure would you determine wortwhileness? If just speed, run some tests and tell us the results please :-) IIRC The list is doubled each time it fills up (a binary progression) so there is some overhead to thisSkill
Reflector can answer the questions in your edit (the answers appear to be 0 and yes (except with 0 -> 4 the first time))Stu
@NateW. I would say initialize it to the lowest limit you are sure about.Guck
C
85

Yes, it gets to be important when your List<T> gets large. The exact numbers depend on the element type and the machine architecture, let's pick a List of reference types on a 32-bit machine. Each element will then take 4 bytes inside an internal array. The list will start out with a Capacity of 0 and an empty array. The first Add() call grows the Capacity to 4, reallocating the internal array to 16 bytes. Four Add() calls later, the array is full and needs to be reallocated again. It doubles the size, Capacity grows to 8, array size to 32 bytes. The previous array is garbage.

This repeats as necessary, several copies of the internal array will become garbage.

Something special happens when the array has grown to 65,536 bytes (16,384 elements). The next Add() doubles the size again to 131,072 bytes. That's a memory allocation that exceeds the threshold for "large objects" (85,000 bytes). The allocation is now no longer made on the generation 0 heap, it is taken from the Large Object Heap.

Objects on the LOH are treated specially. They are only garbage collected during a generation 2 collection. And the heap doesn't get compacted, it takes too much time to move such large chunks.

This repeats as necessary, several LOH objects will become garbage. They can take up memory for quite a while, generation 2 collections do not happen very often. Another problem is that these large blocks tend to fragment the virtual memory address space.

This doesn't repeat endlessly, sooner or later the List class needs to re-allocate the array and it has grown so large that there isn't a hole left in the virtual memory address space to fit the array. Your program will bomb with an OutOfMemoryException. Usually well before all available virtual memory has been consumed.

Long story short, by setting the Capacity early, before you start filling the List, you can reserve that large internal array up front. You won't get all those awkward released blocks in the Large Object Heap and avoid fragmentation. In effect, you'll be able to store many more objects in the list and your program runs leaner since there's so little garbage. Do this only if you have a good idea how large the list will be, using a large Capacity that you'll never fill is wasteful.

Candlelight answered 11/2, 2010 at 22:18 Comment(3)
Very informative answer, I don't think I ever had to work with Lists greater than 10,000 elements but now if I ever think I will need to this knowledge will be very useful to understand the implications of Lists that large.Dialogism
Interesting to see my comment from a decade ago which between now and then having worked with arrays spanning 5 billion indices long.Dialogism
The Large Object Heap issue is the interesting part. Thanks.Nitz
V
19

It is, as per documentation

If the size of the collection can be estimated, specifying the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the List(T).

Vodka answered 11/2, 2010 at 21:22 Comment(1)
I think, it is "0". May check any time using List(T).Capacity propVodka
B
9

Well, it will stop you the values in the list (which will be references if the element type is a reference type) from having to be copied occasionally as the list grows.

If it's going to be a particularly large list and you've got a pretty good idea of the size, it won't hurt. However, if estimating the size involves extra calculations or any significant amount of code, I wouldn't worry about it unless you find it becomes a problem - it could distract from the main focus of the code, and the resizing is unlikely to cause performance issues unless it's a really big list or you're doing it a lot.

Bloxberg answered 11/2, 2010 at 21:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.