Why is ConcurrentDictionary always faster than normal Dictionary?

Asked 21/12, 2023 at 15:7 Answered 22/12, 2023 at 19:21

Solved c#performance dictionary .net-8.0 concurrentdictionary

I was wondering if I should use a List or Dictionary knowing that my structure would have very few items. So I did a simple benchmark 📊 to test for 1, 3, 10, 20 and 50 what was fastest data structure to find the element.

To my surprise, the ConcurrentDictionary was FASTER than the normal Dictionary. I had the feeling that it would be slower because of locks to allow multi-thread access.

Is there a problem with my benchmark?
Is there a logical reason?
Should I always use the ConcurrentDictionary

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.3803/22H2/2022Update)
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=8.0.100
  [Host]     : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2

Method	N	Mean	Error	StdDev	Allocated
ListLoopSearch	1	1.416 ns	0.0153 ns	0.0143 ns	-
DictionarySearch	1	4.253 ns	0.1054 ns	0.1128 ns	-
ConcurrentDictionarySearch	1	2.891 ns	0.0144 ns	0.0127 ns	-
ListLoopSearch	3	6.237 ns	0.0281 ns	0.0234 ns	-
DictionarySearch	3	10.849 ns	0.1968 ns	0.1745 ns	-
ConcurrentDictionarySearch	3	6.225 ns	0.0760 ns	0.0711 ns	-
ListLoopSearch	10	36.174 ns	0.5519 ns	0.5162 ns	-
DictionarySearch	10	33.303 ns	0.6607 ns	0.7608 ns	-
ConcurrentDictionarySearch	10	18.787 ns	0.1348 ns	0.1261 ns	-
ListLoopSearch	20	118.554 ns	1.5678 ns	1.4665 ns	-
DictionarySearch	20	67.028 ns	0.4722 ns	0.3943 ns	-
ConcurrentDictionarySearch	20	36.108 ns	0.6998 ns	0.6546 ns	-
ListLoopSearch	50	979.931 ns	6.9250 ns	5.4066 ns	-
DictionarySearch	50	161.658 ns	0.5295 ns	0.4953 ns	-
ConcurrentDictionarySearch	50	87.737 ns	1.2800 ns	1.0689 ns	-

Here is the full code for the benchmark.

using BenchmarkDotNet.Attributes;
using System.Collections.Concurrent;

public class Item {
    public int Id { get; set; }
    public int Value { get; set; }
}

public class SearchBenchmark {
    [Params(1, 3, 10, 20, 50)] // Number of elements in the structure
    public int NumberOfElements;
    private List<Item> _itemList;
    private Dictionary<int, Item> _itemDictionary;
    private ConcurrentDictionary<int, Item> _itemConcurrentDictionary;
    private List<int> _searchIds; // List of IDs to search

    [GlobalSetup]
    public void Setup() {
        _itemList = Enumerable.Range(1, NumberOfElements).Select(i => new Item { Id = i, Value = i }).ToList();
        _itemDictionary = _itemList.ToDictionary(item => item.Id, item => item);
        _itemConcurrentDictionary = new ConcurrentDictionary<int, Item>(_itemList.Select(item => new KeyValuePair<int, Item>(item.Id, item)));

        // Prepare a list of IDs to search for, covering all elements
        _searchIds = Enumerable.Range(1, NumberOfElements).ToList();
    }

    [Benchmark]
    public void ListLoopSearch() {
        foreach (var id in _searchIds) {
            foreach (var item in _itemList) {
                if (item.Id == id)
                    break; // Found the item, move to the next ID
            }
        }
    }

    [Benchmark]
    public void DictionarySearch() {
        foreach (var id in _searchIds) {
            _itemDictionary.TryGetValue(id, out var result);
        }
    }
    [Benchmark]
    public void ConcurrentDictionarySearch() {
        foreach (var id in _searchIds) {
            _itemConcurrentDictionary.TryGetValue(id, out var result);
        }
    }
}

@jonasH asked if the fact the elements were in consecutive order in the structure and search could be related. So I've modified the code to randomized the elements in the structures and the search. Also added a case with 1000 items. I've get pretty much the same result.

Result with randomized

Method	NumberOfElements	Mean	Error	StdDev
ListLoopSearch	1	1.784 ns	0.0163 ns	0.0153 ns
DictionarySearch	1	4.178 ns	0.0557 ns	0.0465 ns
ConcurrentDictionarySearch	1	3.361 ns	0.0110 ns	0.0102 ns
ListLoopSearch	10	42.138 ns	0.1538 ns	0.1363 ns
DictionarySearch	10	33.073 ns	0.0994 ns	0.0929 ns
ConcurrentDictionarySearch	10	18.996 ns	0.1257 ns	0.1176 ns
ListLoopSearch	50	1,007.447 ns	4.6698 ns	4.1397 ns
DictionarySearch	50	161.110 ns	0.5328 ns	0.4984 ns
ConcurrentDictionarySearch	50	87.332 ns	1.4004 ns	1.3100 ns
ListLoopSearch	1000	324,126.481 ns	1,032.4962 ns	965.7976 ns
DictionarySearch	1000	3,233.511 ns	11.0090 ns	10.2978 ns
ConcurrentDictionarySearch	1000	1,825.452 ns	19.3568 ns	18.1064 ns

Joyous answered 21/12, 2023 at 15:7 Comment(13)

"I had the feeling that it would be slower because of locks to allow multi-thread access." - from the docs: "uses fine-grained locking to ensure thread safety (Read operations on the dictionary are performed in a lock-free manner)". – Polytypic 21/12, 2023 at 15:20

The benchmark is a simple loop, single thread, nothing fancy. I fail to see how that complex structure involving locks would benefits from it. – Joyous 21/12, 2023 at 15:21

The number of operations performed in the loops is to small for a good result. – Doornail 21/12, 2023 at 15:22

You are both adding and searching for items from a ordered list. You also have fairly small collections. This may make the internal lookup strategy the dominating factor. I would shuffle the itemList and searchIds, as well as try with say 10, 100, 1000, 10000 sizes, to see if the difference persists. – Quartic 21/12, 2023 at 15:22

@GuruStron on second thought that was worded too emphatically. The Task library does introduce overhead in order to do what it does, so many simple cases are better off without concurrency involved if they fall under the threshold. This is covered in the advanced programming docs I linked previously – Grandstand 21/12, 2023 at 15:23

Possibly the out var result is being optimized away, try using a field out _someField – Evocative 21/12, 2023 at 15:25

@Grandstand what Task's have to do with anything here? – Polytypic 21/12, 2023 at 15:30

If you're only comparing read times, why don't you benchmark a frozen dictionary too? – Summertree 21/12, 2023 at 15:52

@TheodorZoulias: Your comment breaks your own rule... – Teledu 21/12, 2023 at 16:43

Na, my main point was to link the MS tutorials on it, which are going to cover a lot of the concerns of the post itself – Grandstand 21/12, 2023 at 16:54

@Grandstand thought you have deleted your comment so deleted mine =) But since you didn't - "Of course, Concurrent will always be faster..." - and why exactly is that? Especially in the OP case where no concurrency is happening. – Polytypic 21/12, 2023 at 17:3

@GuruStron I have already said that I wrote that too hyperbolically, my apologies. There is going to be overhead introduced from the use of C#'s parallelism features, and there are going to be cases wherein things would be faster to not use parallelism, even if its set up properly. Its also easy to just screw up your parallelized implementation, so you should be profiling these things. I am not deleting my comment because OP needs to read the documentation – Grandstand 21/12, 2023 at 18:7

Relevant GitHub discussion: ConcurrentDictionary.TryGetValue faster than Dictionary.TryGetValue. – Quality 22/12, 2023 at 15:35

I had the feeling that it would be slower because of locks to allow multi-thread access.

As written in the ConcurrentDictionary<TKey,TValue>:

For modifications and write operations to the dictionary, ConcurrentDictionary<TKey,TValue> uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner)

So since you are testing read-only operation locking should not have effect (thought note that there are so called lock-free approaches to synchronization which can induce some performance cost if used).

As for performance - after quick scanning through the current implemntation I was not able to find something which could clearly explain the difference, at least at the first glance. Dictionary.TryGetValue has an shared method call which maybe not inlined which can have some minor effect, also it uses slightly different approach to handle collisions buckets - it uses 2 arrays (one for array containing all dictionary records and another to store index of first element in the bucket, if I got everything right, see more here) while concurrent version uses linked list for every bucket (see more here).

Is there a problem with my benchmark?

Probably. It is quite synthetic, tests only one concrete key type with one fill scenario. Different key types and/or different collision percentage can (in theory) produce different results.

Should I always use the ConcurrentDictionary

By default - no. I would argue that following the principle of least surprise when developing code will outweigh potential performance benefits (if any) in most scenarios. If you don't expect dictionary usage to be concurrent - do not use concurrent dictionary. And do not forget about dangers of premature optimization.

If you consider a path involving the dictionary usage to be a "critical" one then first of all you need to verify that Dictionary.TryGet is actually that influential in it and then thoroughly test your actual scenario (with appropriate data and usage patterns especially including modification operations if any, and also actual hardware) and setup infrastructure to continually benchmark the corresponding path. Which can be unjustified from resources/costs point of view.

Also if you need readonly dictionary - you should consider FrozenDictionary<TKey,TValue>:

Provides an immutable, read-only dictionary optimized for fast lookup and enumeration.

Used @ThatGuy's benchmark as base to compare with FrozenDictionary. "On my machine" it produces the following results (for 50 items dictionary):

Method	Mean	Error	StdDev
DictionarySearch	355.7 ns	2.54 ns	2.12 ns
FrozenDictionarySearch	272.3 ns	1.82 ns	1.42 ns
ConcurrentDictionarySearch	206.9 ns	2.60 ns	2.43 ns

Note that with string keys in my benchmarks FrozenDictionary showed better performance than both alternatives.

P.S.

Submitted an issue @github for more knowledgeable people to reply.

Polytypic answered 21/12, 2023 at 18:1 Comment(6)

Thanks Guru, that is indeed a pretty good answer. Bonus with the link to source code. – Joyous 21/12, 2023 at 18:12

I am not sure that following the principle of least surprise is enough justification for using a demonstrably less performant collection. It might be, but I am not sure. I am honestly surprised that the ConcurrentDictionary<K,V> outperforms Dictionary<K,V>. This should never happen. The Dictionary<K,V> is supposed to be a highly optimized collection. It has been improved in almost every recent .NET release. The performance should be at least identical, if not better. – Quality 21/12, 2023 at 18:13

I find this answer slightly misleading. It is true that reads are lock free, but it doesn't mean without synchronization primitives. In particular atomic operations are involved, and these are always slower than non-atomic variants used in non-concurrent Dictionary. That's why it is surprising that Dictionary outperforms ConcurrentDictionary. – Sladen 21/12, 2023 at 18:30

@TheodorZoulias "I am honestly surprised that the" - TBH me too. As for performance - I come from idea that this particular part will not have a noticable effect "in grand scheme of things" for "average" enterprise application and actually preforming appropriate benchmarking validating performance gains and then maintaining them (so they remain truthful) arguably are not justified in this case. – Polytypic 21/12, 2023 at 18:30

@Sladen TBH I have not seen any synchronization primitives in current ConcurrentDictionary.TryGetValue implementation - it copies current bucket table reference and works with it (here) as far as I can understand. There is volatile involved but in readonly scenario with one thread it also should not have effect also. – Polytypic 21/12, 2023 at 18:34

@GuruStron well, it does at least put memory fences, likely prevents (or makes less effective) optimizations like inlining. – Sladen 21/12, 2023 at 18:41

It's because a concurrent dictionary's TryGetValue is a basic linked list enumeration of nodes on a null check while a base dictionary's is an array that involves multiple checks and explicit type casting. The difference for retrieving is negligible.

Your benchmarks are flawed. You do nothing with the results such as handing them off to a Consumer object. This means .NET could be doing anything and everything to optimize the code in release configuration. It is best practice to make sure tests consume what they produce properly to avoid .NET from optimizing what you're trying to test. I fixed it and increased the element count to 10,000,000 elements in both dictionaries. The results are not as drastic as you think. Here is the code:

BenchmarkDotNet v0.13.11, Windows 11 (10.0.22621.2861/22H2/2022Update/SunValley2)
AMD Ryzen 5 Microsoft Surface Edition, 1 CPU, 12 logical and 6 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.0 (8.0.23.53103), X64 RyuJIT AVX2

Method	Mean	Error	StdDev
DictionarySearch	67.74 ms	1.222 ms	1.083 ms
ConcurrentDictionarySearch	53.68 ms	0.610 ms	0.540 ms

Remember that this is a 14ms difference over 10,000,000 elements

public class Benchmarks
{
    public class Item {
        public int Id { get; set; }
        public int Value { get; set; }
    }
    
    private const int ItemCount = 10_000_000;
    private Item[] _itemList = new Item[ItemCount];
    private int[] _searchIds = new int[ItemCount]; // List of IDs to search

    private Dictionary<int, Item> _itemDictionary = new();
    private ConcurrentDictionary<int, Item> _itemConcurrentDictionary = new();

    private Consumer _consumer = new();
    
    [GlobalSetup]
    public void Setup() {
        var rand = new Random();
        for (var i = 0; i < _itemList.Length; i++)
        {
            var item = new Item
            {
                Id = i,
                Value = Random.Shared.Next()
            };

            _searchIds[i] = item.Id;
            _itemList[i] = item;
            _itemDictionary.Add(item.Id, item);
            _itemConcurrentDictionary.TryAdd(item.Id, item);
        }

        rand.Shuffle(_itemList);
    }

    [Benchmark]
    public void DictionarySearch() {
        // Modified to search for each ID
        foreach (var id in _searchIds)
        {
            if (!_itemDictionary.TryGetValue(id, out var result))
                continue;
            
            _consumer.Consume(result);
        }
    }
    
    [Benchmark]
    public void ConcurrentDictionarySearch() {
        // Modified to search for each ID
        foreach (var id in _searchIds) {
            if (!_itemConcurrentDictionary.TryGetValue(id, out var result))
                continue;
            
            _consumer.Consume(result);
        }
    }
}

While reading from a concurrent dictionary is a very small amount faster, writing to one is far slower than a base dictionary on a single thread. There's not much reason to use a concurrent dictionary if it's not a concurrent environment because of how slow it will become. I modified the same benchmark to clear both dictionaries after every iteration and reduced the amount of elements to 1,000,000. After running:

Method	Mean	Error	StdDev
DictionaryWrite	98.84 ms	1.413 ms	1.252 ms
ConcurrentDictionaryWrite	490.79 ms	3.616 ms	3.206 ms

[GlobalSetup]
public void GlobalSetup() {
    var rand = new Random();
    for (var i = 0; i < _itemList.Length; i++)
    {
        var item = new Item
        {
            Id = i,
            Value = Random.Shared.Next()
        };

        _searchIds[i] = item.Id;
        _itemList[i] = item;
    }

    rand.Shuffle(_searchIds);
    rand.Shuffle(_itemList);
}

[IterationSetup]
public void IterationSetup()
{
    _itemDictionary.Clear();
    _itemConcurrentDictionary.Clear();
}

[Benchmark]
public void DictionaryWrite() { 
    foreach (var i in _itemList)
    {
        var added = _itemDictionary.TryAdd(i.Id, i);
        _consumer.Consume(added);
    }
}

[Benchmark]
public void ConcurrentDictionaryWrite() {
    foreach (var i in _itemList) {
        var added = _itemConcurrentDictionary.TryAdd(i.Id, i);
        _consumer.Consume(added);
    }
}

For the purpose of practicality to be at least a little relevant to a real scenario, here's a two sum benchmark because if you're just enumerating it, you're probably using the wrong data structure:

Method	TwoSumValCount	Mean	Error	StdDev
TwoSum	50	867.6 ns	11.55 ns	11.34 ns
TwoSumConcurrent	50	2,549.8 ns	25.00 ns	22.16 ns
TwoSum	10000000	5,949,825.4 ns	94,161.74 ns	128,889.65 ns
TwoSumConcurrent	10000000	66,325,225.3 ns	1,321,310.23 ns	2,577,114.13 ns

public class Benchmarks
{
    [Params(50, 10_000_000)]
    public int TwoSumValCount;
    private long[] _twoSumValues = Array.Empty<long>();
    private long _twoSumTarget = 0;

    private Consumer _consumer = new();
    
    [GlobalSetup]
    public void Setup()
    {
        Trace.Assert(TwoSumValCount > 2);
        var nums = new HashSet<long>();
        while (nums.Count < TwoSumValCount)
        {
            var l = Random.Shared.NextInt64(-1_000_000_000, 5_000_000_000);
            nums.Add(l);
        }

        _twoSumValues = nums.ToArray();
        Random.Shared.Shuffle(_twoSumValues);
        _twoSumTarget = _twoSumValues[^1] + _twoSumValues[^2];
    }

    [Benchmark]
    public void TwoSum()
    {
        var map = new Dictionary<long, long> { { _twoSumValues[0], 0 } };

        long[]? result = null;
        for (var i = 1; i < _twoSumValues.Length; i++)
        {
            var n = _twoSumValues[i];
            if (map.TryGetValue(_twoSumTarget - n, out var neededI))
            {
                result = new[] { neededI, i };
                break;
            }
            
            map.TryAdd(n, i);
        }

        Trace.Assert(result != null);
        _consumer.Consume(result);
    }
    
    [Benchmark]
    public void TwoSumConcurrent()
    {
        var map = new ConcurrentDictionary<long, long>();
        map.TryAdd(_twoSumValues[0], 0);

        long[]? result = null;
        for (var i = 1; i < _twoSumValues.Length; i++)
        {
            var n = _twoSumValues[i];
            if (map.TryGetValue(_twoSumTarget - n, out var neededI))
            {
                result = new[] { neededI, i };
                break;
            }
            
            map.TryAdd(n, i);
        }

        Trace.Assert(result != null);
        _consumer.Consume(result);
    }
}

Like I said in the comments of your question, definitely look into a frozen dictionary if all you need to do is read.

Summertree answered 22/12, 2023 at 1:9 Comment(5)

Where is the definition of the Consumer class? – Quality 22/12, 2023 at 4:58

@TheodorZoulias it's part of the BenchmarkDotNet.Engines namespace, and the source – Summertree 22/12, 2023 at 5:48

JFYI - on small number of elements (50) your benchmark gives me concurrent dictionary being 1.5x faster than ordinary one. Which arguably still a significant and unexpected difference. – Polytypic 22/12, 2023 at 12:1

@GuruStron Remember that purely enumerating a dictionary probably means the wrong data structure is used. A list would be better. Went ahead and added a two sum benchmark for some practicality in data and the base dictionary did just fine – Summertree 22/12, 2023 at 13:8

@Summertree "Remember that purely enumerating a dictionary probably means the wrong data structure is used. " - while I can agree with the statement at least to some extent, using 2sum as example of "some practicality" is not a very obvious choice =) Again, as I've mentioned in my answer - when "pure" code performance becomes a concern then it should be benchmarked against scenarios as close to real ones as possible. And I'm sure there are people for whom Dictionary.TryGetValue perf can be a concern, though I would guess that they are mainly the runtime developers =) – Polytypic 22/12, 2023 at 19:45

I was wondering if I should use a List or Dictionary knowing that my structure would have very few items.

I've been developing software for over two decades now and haven't yet encountered a situation where using a different data structure made a critical difference to performance. Don't "wonder" and especially don't waste time micro-optimising before you even know you need to do so: use whatever makes your program easiest to read, and therefore maintain, first; optimise second, and only if you absolutely need to.

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

— Donald Knuth, 1974 Turing Award lecture, pg. 671

End answered 22/12, 2023 at 19:21 Comment(1)

Thanks for you input, my concern is about memory usage. The dictionary takes more memory usage than an array. But before switching to an array, I wan to see how performance will be affected. – Joyous 31/12, 2023 at 14:28

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags