Why is ConcurrentDictionary always faster than normal Dictionary?
Asked Answered
J

3

3

I was wondering if I should use a List or Dictionary knowing that my structure would have very few items. So I did a simple benchmark πŸ“Š to test for 1, 3, 10, 20 and 50 what was fastest data structure to find the element.

To my surprise, the ConcurrentDictionary was FASTER than the normal Dictionary. I had the feeling that it would be slower because of locks to allow multi-thread access.

  • Is there a problem with my benchmark?
  • Is there a logical reason?
  • Should I always use the ConcurrentDictionary
BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.3803/22H2/2022Update)
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=8.0.100
  [Host]     : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
Method N Mean Error StdDev Allocated
ListLoopSearch 1 1.416 ns 0.0153 ns 0.0143 ns -
DictionarySearch 1 4.253 ns 0.1054 ns 0.1128 ns -
ConcurrentDictionarySearch 1 2.891 ns 0.0144 ns 0.0127 ns -
ListLoopSearch 3 6.237 ns 0.0281 ns 0.0234 ns -
DictionarySearch 3 10.849 ns 0.1968 ns 0.1745 ns -
ConcurrentDictionarySearch 3 6.225 ns 0.0760 ns 0.0711 ns -
ListLoopSearch 10 36.174 ns 0.5519 ns 0.5162 ns -
DictionarySearch 10 33.303 ns 0.6607 ns 0.7608 ns -
ConcurrentDictionarySearch 10 18.787 ns 0.1348 ns 0.1261 ns -
ListLoopSearch 20 118.554 ns 1.5678 ns 1.4665 ns -
DictionarySearch 20 67.028 ns 0.4722 ns 0.3943 ns -
ConcurrentDictionarySearch 20 36.108 ns 0.6998 ns 0.6546 ns -
ListLoopSearch 50 979.931 ns 6.9250 ns 5.4066 ns -
DictionarySearch 50 161.658 ns 0.5295 ns 0.4953 ns -
ConcurrentDictionarySearch 50 87.737 ns 1.2800 ns 1.0689 ns -

Here is the full code for the benchmark.

using BenchmarkDotNet.Attributes;
using System.Collections.Concurrent;

public class Item {
    public int Id { get; set; }
    public int Value { get; set; }
}

public class SearchBenchmark {
    [Params(1, 3, 10, 20, 50)] // Number of elements in the structure
    public int NumberOfElements;
    private List<Item> _itemList;
    private Dictionary<int, Item> _itemDictionary;
    private ConcurrentDictionary<int, Item> _itemConcurrentDictionary;
    private List<int> _searchIds; // List of IDs to search

    [GlobalSetup]
    public void Setup() {
        _itemList = Enumerable.Range(1, NumberOfElements).Select(i => new Item { Id = i, Value = i }).ToList();
        _itemDictionary = _itemList.ToDictionary(item => item.Id, item => item);
        _itemConcurrentDictionary = new ConcurrentDictionary<int, Item>(_itemList.Select(item => new KeyValuePair<int, Item>(item.Id, item)));

        // Prepare a list of IDs to search for, covering all elements
        _searchIds = Enumerable.Range(1, NumberOfElements).ToList();
    }

    [Benchmark]
    public void ListLoopSearch() {
        foreach (var id in _searchIds) {
            foreach (var item in _itemList) {
                if (item.Id == id)
                    break; // Found the item, move to the next ID
            }
        }
    }

    [Benchmark]
    public void DictionarySearch() {
        foreach (var id in _searchIds) {
            _itemDictionary.TryGetValue(id, out var result);
        }
    }
    [Benchmark]
    public void ConcurrentDictionarySearch() {
        foreach (var id in _searchIds) {
            _itemConcurrentDictionary.TryGetValue(id, out var result);
        }
    }
}

@jonasH asked if the fact the elements were in consecutive order in the structure and search could be related. So I've modified the code to randomized the elements in the structures and the search. Also added a case with 1000 items. I've get pretty much the same result.

Result with randomized

Method NumberOfElements Mean Error StdDev
ListLoopSearch 1 1.784 ns 0.0163 ns 0.0153 ns
DictionarySearch 1 4.178 ns 0.0557 ns 0.0465 ns
ConcurrentDictionarySearch 1 3.361 ns 0.0110 ns 0.0102 ns
ListLoopSearch 10 42.138 ns 0.1538 ns 0.1363 ns
DictionarySearch 10 33.073 ns 0.0994 ns 0.0929 ns
ConcurrentDictionarySearch 10 18.996 ns 0.1257 ns 0.1176 ns
ListLoopSearch 50 1,007.447 ns 4.6698 ns 4.1397 ns
DictionarySearch 50 161.110 ns 0.5328 ns 0.4984 ns
ConcurrentDictionarySearch 50 87.332 ns 1.4004 ns 1.3100 ns
ListLoopSearch 1000 324,126.481 ns 1,032.4962 ns 965.7976 ns
DictionarySearch 1000 3,233.511 ns 11.0090 ns 10.2978 ns
ConcurrentDictionarySearch 1000 1,825.452 ns 19.3568 ns 18.1064 ns
Joyous answered 21/12, 2023 at 15:7 Comment(13)
"I had the feeling that it would be slower because of locks to allow multi-thread access." - from the docs: "uses fine-grained locking to ensure thread safety (Read operations on the dictionary are performed in a lock-free manner)". – Polytypic
The benchmark is a simple loop, single thread, nothing fancy. I fail to see how that complex structure involving locks would benefits from it. – Joyous
The number of operations performed in the loops is to small for a good result. – Doornail
You are both adding and searching for items from a ordered list. You also have fairly small collections. This may make the internal lookup strategy the dominating factor. I would shuffle the itemList and searchIds, as well as try with say 10, 100, 1000, 10000 sizes, to see if the difference persists. – Quartic
@GuruStron on second thought that was worded too emphatically. The Task library does introduce overhead in order to do what it does, so many simple cases are better off without concurrency involved if they fall under the threshold. This is covered in the advanced programming docs I linked previously – Grandstand
Possibly the out var result is being optimized away, try using a field out _someField – Evocative
@Grandstand what Task's have to do with anything here? – Polytypic
If you're only comparing read times, why don't you benchmark a frozen dictionary too? – Summertree
@TheodorZoulias: Your comment breaks your own rule... – Teledu
Na, my main point was to link the MS tutorials on it, which are going to cover a lot of the concerns of the post itself – Grandstand
@Grandstand thought you have deleted your comment so deleted mine =) But since you didn't - "Of course, Concurrent will always be faster..." - and why exactly is that? Especially in the OP case where no concurrency is happening. – Polytypic
@GuruStron I have already said that I wrote that too hyperbolically, my apologies. There is going to be overhead introduced from the use of C#'s parallelism features, and there are going to be cases wherein things would be faster to not use parallelism, even if its set up properly. Its also easy to just screw up your parallelized implementation, so you should be profiling these things. I am not deleting my comment because OP needs to read the documentation – Grandstand
Relevant GitHub discussion: ConcurrentDictionary.TryGetValue faster than Dictionary.TryGetValue. – Quality
P
5

I had the feeling that it would be slower because of locks to allow multi-thread access.

As written in the ConcurrentDictionary<TKey,TValue>:

For modifications and write operations to the dictionary, ConcurrentDictionary<TKey,TValue> uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner)

So since you are testing read-only operation locking should not have effect (thought note that there are so called lock-free approaches to synchronization which can induce some performance cost if used).

As for performance - after quick scanning through the current implemntation I was not able to find something which could clearly explain the difference, at least at the first glance. Dictionary.TryGetValue has an shared method call which maybe not inlined which can have some minor effect, also it uses slightly different approach to handle collisions buckets - it uses 2 arrays (one for array containing all dictionary records and another to store index of first element in the bucket, if I got everything right, see more here) while concurrent version uses linked list for every bucket (see more here).

Is there a problem with my benchmark?

Probably. It is quite synthetic, tests only one concrete key type with one fill scenario. Different key types and/or different collision percentage can (in theory) produce different results.

Should I always use the ConcurrentDictionary

By default - no. I would argue that following the principle of least surprise when developing code will outweigh potential performance benefits (if any) in most scenarios. If you don't expect dictionary usage to be concurrent - do not use concurrent dictionary. And do not forget about dangers of premature optimization.

If you consider a path involving the dictionary usage to be a "critical" one then first of all you need to verify that Dictionary.TryGet is actually that influential in it and then thoroughly test your actual scenario (with appropriate data and usage patterns especially including modification operations if any, and also actual hardware) and setup infrastructure to continually benchmark the corresponding path. Which can be unjustified from resources/costs point of view.

Also if you need readonly dictionary - you should consider FrozenDictionary<TKey,TValue>:

Provides an immutable, read-only dictionary optimized for fast lookup and enumeration.

Used @ThatGuy's benchmark as base to compare with FrozenDictionary. "On my machine" it produces the following results (for 50 items dictionary):

Method Mean Error StdDev
DictionarySearch 355.7 ns 2.54 ns 2.12 ns
FrozenDictionarySearch 272.3 ns 1.82 ns 1.42 ns
ConcurrentDictionarySearch 206.9 ns 2.60 ns 2.43 ns

Note that with string keys in my benchmarks FrozenDictionary showed better performance than both alternatives.

P.S.

Submitted an issue @github for more knowledgeable people to reply.

Polytypic answered 21/12, 2023 at 18:1 Comment(6)
Thanks Guru, that is indeed a pretty good answer. Bonus with the link to source code. – Joyous
I am not sure that following the principle of least surprise is enough justification for using a demonstrably less performant collection. It might be, but I am not sure. I am honestly surprised that the ConcurrentDictionary<K,V> outperforms Dictionary<K,V>. This should never happen. The Dictionary<K,V> is supposed to be a highly optimized collection. It has been improved in almost every recent .NET release. The performance should be at least identical, if not better. – Quality
I find this answer slightly misleading. It is true that reads are lock free, but it doesn't mean without synchronization primitives. In particular atomic operations are involved, and these are always slower than non-atomic variants used in non-concurrent Dictionary. That's why it is surprising that Dictionary outperforms ConcurrentDictionary. – Sladen
@TheodorZoulias "I am honestly surprised that the" - TBH me too. As for performance - I come from idea that this particular part will not have a noticable effect "in grand scheme of things" for "average" enterprise application and actually preforming appropriate benchmarking validating performance gains and then maintaining them (so they remain truthful) arguably are not justified in this case. – Polytypic
@Sladen TBH I have not seen any synchronization primitives in current ConcurrentDictionary.TryGetValue implementation - it copies current bucket table reference and works with it (here) as far as I can understand. There is volatile involved but in readonly scenario with one thread it also should not have effect also. – Polytypic
@GuruStron well, it does at least put memory fences, likely prevents (or makes less effective) optimizations like inlining. – Sladen
S
2

It's because a concurrent dictionary's TryGetValue is a basic linked list enumeration of nodes on a null check while a base dictionary's is an array that involves multiple checks and explicit type casting. The difference for retrieving is negligible.

Your benchmarks are flawed. You do nothing with the results such as handing them off to a Consumer object. This means .NET could be doing anything and everything to optimize the code in release configuration. It is best practice to make sure tests consume what they produce properly to avoid .NET from optimizing what you're trying to test. I fixed it and increased the element count to 10,000,000 elements in both dictionaries. The results are not as drastic as you think. Here is the code:

BenchmarkDotNet v0.13.11, Windows 11 (10.0.22621.2861/22H2/2022Update/SunValley2)
AMD Ryzen 5 Microsoft Surface Edition, 1 CPU, 12 logical and 6 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
Method Mean Error StdDev
DictionarySearch 67.74 ms 1.222 ms 1.083 ms
ConcurrentDictionarySearch 53.68 ms 0.610 ms 0.540 ms

Remember that this is a 14ms difference over 10,000,000 elements

public class Benchmarks
{
    public class Item {
        public int Id { get; set; }
        public int Value { get; set; }
    }
    
    private const int ItemCount = 10_000_000;
    private Item[] _itemList = new Item[ItemCount];
    private int[] _searchIds = new int[ItemCount]; // List of IDs to search

    private Dictionary<int, Item> _itemDictionary = new();
    private ConcurrentDictionary<int, Item> _itemConcurrentDictionary = new();

    private Consumer _consumer = new();
    
    [GlobalSetup]
    public void Setup() {
        var rand = new Random();
        for (var i = 0; i < _itemList.Length; i++)
        {
            var item = new Item
            {
                Id = i,
                Value = Random.Shared.Next()
            };

            _searchIds[i] = item.Id;
            _itemList[i] = item;
            _itemDictionary.Add(item.Id, item);
            _itemConcurrentDictionary.TryAdd(item.Id, item);
        }

        rand.Shuffle(_itemList);
    }

    [Benchmark]
    public void DictionarySearch() {
        // Modified to search for each ID
        foreach (var id in _searchIds)
        {
            if (!_itemDictionary.TryGetValue(id, out var result))
                continue;
            
            _consumer.Consume(result);
        }
    }
    
    [Benchmark]
    public void ConcurrentDictionarySearch() {
        // Modified to search for each ID
        foreach (var id in _searchIds) {
            if (!_itemConcurrentDictionary.TryGetValue(id, out var result))
                continue;
            
            _consumer.Consume(result);
        }
    }
}

While reading from a concurrent dictionary is a very small amount faster, writing to one is far slower than a base dictionary on a single thread. There's not much reason to use a concurrent dictionary if it's not a concurrent environment because of how slow it will become. I modified the same benchmark to clear both dictionaries after every iteration and reduced the amount of elements to 1,000,000. After running:

Method Mean Error StdDev
DictionaryWrite 98.84 ms 1.413 ms 1.252 ms
ConcurrentDictionaryWrite 490.79 ms 3.616 ms 3.206 ms
[GlobalSetup]
public void GlobalSetup() {
    var rand = new Random();
    for (var i = 0; i < _itemList.Length; i++)
    {
        var item = new Item
        {
            Id = i,
            Value = Random.Shared.Next()
        };

        _searchIds[i] = item.Id;
        _itemList[i] = item;
    }

    rand.Shuffle(_searchIds);
    rand.Shuffle(_itemList);
}

[IterationSetup]
public void IterationSetup()
{
    _itemDictionary.Clear();
    _itemConcurrentDictionary.Clear();
}

[Benchmark]
public void DictionaryWrite() { 
    foreach (var i in _itemList)
    {
        var added = _itemDictionary.TryAdd(i.Id, i);
        _consumer.Consume(added);
    }
}

[Benchmark]
public void ConcurrentDictionaryWrite() {
    foreach (var i in _itemList) {
        var added = _itemConcurrentDictionary.TryAdd(i.Id, i);
        _consumer.Consume(added);
    }
}

For the purpose of practicality to be at least a little relevant to a real scenario, here's a two sum benchmark because if you're just enumerating it, you're probably using the wrong data structure:

Method TwoSumValCount Mean Error StdDev
TwoSum 50 867.6 ns 11.55 ns 11.34 ns
TwoSumConcurrent 50 2,549.8 ns 25.00 ns 22.16 ns
TwoSum 10000000 5,949,825.4 ns 94,161.74 ns 128,889.65 ns
TwoSumConcurrent 10000000 66,325,225.3 ns 1,321,310.23 ns 2,577,114.13 ns
public class Benchmarks
{
    [Params(50, 10_000_000)]
    public int TwoSumValCount;
    private long[] _twoSumValues = Array.Empty<long>();
    private long _twoSumTarget = 0;

    private Consumer _consumer = new();
    
    [GlobalSetup]
    public void Setup()
    {
        Trace.Assert(TwoSumValCount > 2);
        var nums = new HashSet<long>();
        while (nums.Count < TwoSumValCount)
        {
            var l = Random.Shared.NextInt64(-1_000_000_000, 5_000_000_000);
            nums.Add(l);
        }

        _twoSumValues = nums.ToArray();
        Random.Shared.Shuffle(_twoSumValues);
        _twoSumTarget = _twoSumValues[^1] + _twoSumValues[^2];
    }

    [Benchmark]
    public void TwoSum()
    {
        var map = new Dictionary<long, long> { { _twoSumValues[0], 0 } };

        long[]? result = null;
        for (var i = 1; i < _twoSumValues.Length; i++)
        {
            var n = _twoSumValues[i];
            if (map.TryGetValue(_twoSumTarget - n, out var neededI))
            {
                result = new[] { neededI, i };
                break;
            }
            
            map.TryAdd(n, i);
        }

        Trace.Assert(result != null);
        _consumer.Consume(result);
    }
    
    [Benchmark]
    public void TwoSumConcurrent()
    {
        var map = new ConcurrentDictionary<long, long>();
        map.TryAdd(_twoSumValues[0], 0);

        long[]? result = null;
        for (var i = 1; i < _twoSumValues.Length; i++)
        {
            var n = _twoSumValues[i];
            if (map.TryGetValue(_twoSumTarget - n, out var neededI))
            {
                result = new[] { neededI, i };
                break;
            }
            
            map.TryAdd(n, i);
        }

        Trace.Assert(result != null);
        _consumer.Consume(result);
    }
}

Like I said in the comments of your question, definitely look into a frozen dictionary if all you need to do is read.

Summertree answered 22/12, 2023 at 1:9 Comment(5)
Where is the definition of the Consumer class? – Quality
@TheodorZoulias it's part of the BenchmarkDotNet.Engines namespace, and the source – Summertree
JFYI - on small number of elements (50) your benchmark gives me concurrent dictionary being 1.5x faster than ordinary one. Which arguably still a significant and unexpected difference. – Polytypic
@GuruStron Remember that purely enumerating a dictionary probably means the wrong data structure is used. A list would be better. Went ahead and added a two sum benchmark for some practicality in data and the base dictionary did just fine – Summertree
@Summertree "Remember that purely enumerating a dictionary probably means the wrong data structure is used. " - while I can agree with the statement at least to some extent, using 2sum as example of "some practicality" is not a very obvious choice =) Again, as I've mentioned in my answer - when "pure" code performance becomes a concern then it should be benchmarked against scenarios as close to real ones as possible. And I'm sure there are people for whom Dictionary.TryGetValue perf can be a concern, though I would guess that they are mainly the runtime developers =) – Polytypic
E
0

I was wondering if I should use a List or Dictionary knowing that my structure would have very few items.

I've been developing software for over two decades now and haven't yet encountered a situation where using a different data structure made a critical difference to performance. Don't "wonder" and especially don't waste time micro-optimising before you even know you need to do so: use whatever makes your program easiest to read, and therefore maintain, first; optimise second, and only if you absolutely need to.

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

β€” Donald Knuth, 1974 Turing Award lecture, pg. 671

End answered 22/12, 2023 at 19:21 Comment(1)
Thanks for you input, my concern is about memory usage. The dictionary takes more memory usage than an array. But before switching to an array, I wan to see how performance will be affected. – Joyous

© 2022 - 2024 β€” McMap. All rights reserved.