Initial capacity of collection types, e.g. Dictionary, List
Asked Answered
H

5

55

Certain collection types in .Net have an optional "Initial Capacity" constructor parameter. For example:

Dictionary<string, string> something = new Dictionary<string,string>(20);

List<string> anything = new List<string>(50);

I can't seem to find what the default initial capacity is for these objects on MSDN.

If I know I will only be storing 12 or so items in a dictionary, doesn't it make sense to set the initial capacity to something like 20?

My reasoning is, assuming that the capacity grows like it does for a StringBuilder, which doubles each time the capacity is hit, and each reallocation is costly, why not pre-set the size to something you know will hold your data, with some extra room just in case? If the initial capacity is 100, and I know I will only need a dozen or so, it seems as though the rest of that memory is allocated for nothing.

Hambrick answered 3/5, 2010 at 20:18 Comment(0)
T
88

If the default values are not documented, the reason is likely that the optimal initial capacity is an implementation detail and subject to change between framework versions. That is, you shouldn't write code that assumes a certain default value.

The constructor overloads with a capacity are for cases in which you know better than the class what number of items are to be expected. For example, if you create a collection of 50 values and know that this number will never increase, you can initialize the collection with a capacity of 50, so it won't have to resize if the default capacity is lower.

That said, you can determine the default values using Reflector. For example, in .NET 4.0 (and probably previous versions as well),

  • a List<T> is initialized with a capacity of 0. When the first item is added, it is reinitialized to a capacity of 4. Subsequently, whenever the capacity is reached, the capacity is doubled.

  • a Dictionary<T> is intialized with a capacity of 0 as well. But it uses a completely different algorithm to increase the capacity: it increases the capacity always to prime numbers.

Thaw answered 3/5, 2010 at 20:21 Comment(4)
The prime number calculation is likely to deal with hash collisions and probing for input locations. Depending on the internal mechanism if they only store one value at each hash then they need secondary storage locations. If you don't use a prime then you can potentially find a hash that you can fail to insert.Laguna
Dictionary<T> uses chaining. Prime number table sizes compensate for poor hash functions. Good hash functions produce random distributions; power of two table sizes are used in modern hash tables (the .net hash table was based on the Java hash table, which also used prime numbers, because that was an old way of doing it, in the days of poor hash functions). Because Microsoft did not provide built in hash combining methods, many home-built hash functions produce poor distributions, so the prime number choice compensates, sometimes -- until the hash function produces multiples of prime numbers.Boding
@FrankHileman "power of two table sizes are used in modern hash tables". Rust uses powers of two and its hash tables are significantly slower than .NET's because it requires much more complicated hash functions to avoid collisions.Procreate
@John Harrop Collisions are avoided by using high quality hash functions. Table size then becomes an independent variable.Boding
P
13

If you know the size, then tell it; a minor optimisation in most "small" cases, but useful for bigger collections. I would mainly worry about this if I am throwing a "decent" amount of data in, as it can then avoid having to allocate, copy and collect multiple arrays.

Most collections indeed use a doubling strategy.

Prosperous answered 3/5, 2010 at 20:22 Comment(0)
S
9

Checking the source, the default capacity for both List<T> and Dictionary<TKey, TValue> is 0.

Spada answered 3/5, 2010 at 20:22 Comment(1)
In .Net 4.5 the the additional capacity is actually 3. Yes, the default constructor calls an overloaded constructor with a capacity value of 0, but the when the constructor calls the Initialize method the size is set to 3. The actual size of the dictionary is determined from a call to HashHelpers.GetPrime(capacity) which returns the next prime number thats greater than the provided capacity. Thus, in .Net 4.5 the initial capacity for a Dictionary is 3. Lists do have a default capacity of 0, but the capacity goes to 4 after adding the first item to the list.Voltammeter
B
3

Another issue with the ConcurrentDictionary (currently) and using its constructor to set an initial size is that its performance appears to be hindered.

For example, here's some example code and benchmarks I tried.

I ran the code on my machine and got similar results.

That is, when the initial size is specified, it does nothing to increase the ConcurrentDictionary's speed when adding objects. Technically, I think it should because it doesn't have to take time or resources to resize itself.

Yes, it may not run as fast as a normal Dictionary, but I would still expect a ConcurrentDictionary with its initial size set to have consistent, faster performance than a ConcurrentDictionary that doesn't have its initial size set, especially when one knows in advance the number of items that are going to be added to it.

So the moral of the story is setting the initial size doesn't always guarantee a performance improvement.

Boutte answered 29/1, 2015 at 7:33 Comment(0)
B
0

Use this regular expression new List[<].*[>][(\(\))?[ ]+[{] in Visual Studio Ctrl+Shift+F with regular expression option on to search all lists that you might have to add an initial capacity to it ;-)

Barclay answered 13/4, 2022 at 12:55 Comment(1)
Though perhaps a useful tip, this doesn't answer any part of the question.Autogiro

© 2022 - 2024 — McMap. All rights reserved.