Is there a benefit to Tuple-based or Nested Dictionaries?
Asked Answered
P

5

27

I've been looking for a way to store and retrieve values on more than the single key that C#'s generic Dictionary class provides.

Searching around the web (and on SO itself) has shown me a couple options:

Tuple Based Dictionaries

.NET 4.0 makes it easy to support a generic Tuple<,> class. This means you can make a Dictionary out of any arbitrary Tuple, i.e.,

  • var myDict = new Dictionary<Tuple<Char, Int>, MyClass>();

Nested Dictionaries

I've learned you can also nest Dictionaries within Dictionaries, which makes accessing the stored result similar to accessing an N-Dimensional array. For instance:

Dictionary<int, Dictionary<int, Dictionary<Char, MyClass>>>

which could then be accsessed by: MyClass foo = MyData[8][3]['W'];

Delimited Concatenated Key Dictionaries

But while both work well for complex data and custom classes, I wonder if they're always necessary. For primitive data, at least, it would seem that concatenating the keys with a delimiter is just as effective.

//keys are char + int
Dictionary<string, MyClass> myDict = New Dictionary<string, Myclass>();
String input = myChar + "|" + myInt
MyClass foo = myDict[input]

Are there any scenarios which make one of these methods superior to the other? Will they have similar performance times? Or should the focus be instead on which method provides the cleanest, easiest to maintain, code?

Thoughts?

Prynne answered 10/8, 2012 at 20:44 Comment(6)
You assume that everything that you would use as a key is easily convertable to a short string representation of that object.Tiernan
Always go for the clean, maintainable code first. Then look at performance tweaks if prformance is an issue.Lashay
This question is does not lend itself to a specific answer. stackoverflow.com/faq#dontask That being said I'd use multi-dictionaries as it's easier to pull out the keys if you need them.Southpaw
@Tiernan Hence, "For primitive data, at least,..." :)Prynne
This is very similar to tuples-or-arrays-as-dictionary-keys-in-c-sharpSmuggle
char(31) works well as a delimiter in my experience for anyone reading. It is fairly unusual to occur (ymmv).Epitomize
R
23

Delimited Concatenated Key Dictionaries

There are at least three reasons why I would avoid this approach:

  • It is magic. There is nothing in the type of the key that tells you how to construct it or what it represents.
  • If the delimiter accidentally appears as one of the values, your approach fails.
  • Conversion to strings, and comparison of these strings is likely to be (slightly) slower than using two primitive types.

Nested Dictionaries

This solves the problem with the delimiter, but introduce some new problems:

  • Insertion of new values is difficult because for each nested level you have to check whether that key already exists. If not, you would need to create a new dictionary as the value. This makes using the dictionary more difficult.
  • There will be a further memory and performance overhead.

Tuple Based Dictionaries

Of the approaches you posted, this is probably the best.

But you could take it one step further and create a named immutable struct for your key. This will make your dictionary easier to use because the parts of the key can have useful names.

Roubaix answered 10/8, 2012 at 20:46 Comment(8)
I worked at a company that used pipe separated values. One day, we got a customer called Acme || (yes, they thought using two pipes rather than II would be cool).Arredondo
How is using a struct different than using a class (like the Tuple)?Prynne
@RavenDreamer: The struct is not the important thing in "named immutable struct". There are two important things. a) It has a name. b) It is immutable. The less important thing is c) That it's a struct, which will give you a sensible Equals and GetHashCode for free, but there are also situations where a class is better.Roubaix
@RavenDreamer if you dont require the dictionary to be a class level member, then anonymous type is also an option as key. var dict = new object[0].ToDictionary(x => new { char, int }, x => MyClass)Smuggle
@MarkByers the free Equals and GetHashCode on struct use reflection internally and are the worst performing dictionary keys. When creating a struct for use as a key, always override this methods.Walrath
Note that with the advent of value tuples, it is possible to have meaningful names with the built-in tuple type as well. A named struct is fine as well, but if I were to create a simple named struct in today's .NET, I'd back the data with a value tuple and delegate properties, Equals(), and GetHashCode() to it.Fleming
@PeterDuniho am wondering: what about using a new(ish) readonly record struct? My understanding is that gives you (non reflection based) compiler synthesized Equals() and GetHashCode() for free. Am I missing something that still makes you prefer your "named struct backed by value tuple data and delegation" suggestion?Olivenite
@MemeDeveloper: it depends on the situation. record data type is extremely useful, and a fine alternative to value tuples. But you still have to maintain its declaration somewhere. If the type is to be used only locally, I think tuple is still the way to go. If you expect the type to be used broadly across the code base, a declared type like record is better most of the time. In between those two extremes, you need to decide what your threshold for declaring new types is. Personally, I mainly use tuples only where the type is used in a single method, or perhaps just a few related ones.Fleming
S
9

Or should the focus be instead on which method provides the cleanest, easiest to maintain, code?

Unless your focus is on writing nightmarish, intimidating code you should be avoiding the string delimiting and concatenating approach which is evil which goes without saying.

Choosing between tuple and nested dictionaries based approaches depends on your context. Tweak for performance? Or tweak for readability? I will talk about latter first.

From maintainability point of view,

  • Its much easier to implement a functionality that looks like:

    var myDict = new Dictionary<Tuple<char, int>, MyClass>();
    

    than

    var myDict = new Dictionary<char, Dictionary<int, MyClass>>();
    

    from the callee side. In the second case each addition, lookup, removal etc require action on more than one dictionary.

  • Furthermore, if your composite key require one more (or less) field in future, you will need to change code a significant lot in the second case (nested dictionary) since you have to add further nested dictionaries and subsequent checks.

From performance perspective, the best conclusion you can reach is by measuring it yourself. But there are a few theoretical limitations which you can consider beforehand:

  • In the nested dictionary case, having an additional dictionary for every keys (outer and inner) will have some memory overhead (more than what creating a tuple would have).

  • In the nested dictionary case, every basic action like addition, updation, lookup, removal etc need to be carried out in two dictionaries. Now there is a case where nested dictionary approach can be faster, ie, when the data being looked up is absent, since the intermediate dictionaries can bypass the full hash code computation & comparison, but then again it should be timed to be sure. In presence of data, it should be slower since lookups should be performed twice (or thrice depending on nesting).

  • Regarding tuple approach, .NET tuples are not the most performant when they're meant to be used as keys in sets since its Equals and GetHashCode implementation causes boxing for value types.

On the whole, I find very little need for nested dictionary approach. Odds are one would not want it. I would prefer tuple based approach, but you should write one your own tuple with a better implementation, and in this one case of char and int keys, I prefer making it a (immutable) struct.

A very related question: Tuples( or arrays ) as Dictionary keys in C#

Smuggle answered 13/1, 2014 at 12:13 Comment(1)
Am wondering if today would you recommend record structs as per my suggestion below? Or are there still advantages to a tuple key I am not following?Olivenite
H
8

I wanted to add to the above answers, that there are some scenarios (depends on how the data is distributed) in which a nested dictionary is much better than a composite key dictionary in terms of memory footprint (which in turn might lead to better performance overall). The reason for that is that the nesting can save you the need of saving duplicated values for the keys, which in large dictionaries will make the footprint of the extra dictionaries negligible.

For example, say I need a dictionary with a composite key of (male/female),(baby/young/old),(age).

Let's save some values with a dictionary of composite keys:

(male, baby, 1)
(male, baby, 2)
(male, baby, 3)
(male, young, 21)
(male, young, 22)
(male, young, 23)
(male, old, 91)
(male, old, 92)
(male, old, 93)
(female, baby, 1)
(female, baby, 2)
(female, baby, 3)
(female, young, 21)
(female, young, 22)
(female, young, 23)
(female, old, 91)
(female, old, 92)
(female, old, 93)

Now let's save the same values in a dictionary of dictionaries:

male -> baby ->  1
                 2
                 3
        young -> 21
                 22
                 23
        old  ->  91
                 92
                 93
female -> baby ->1
                 2
                 3
        young -> 21
                 22
                 23
        old  ->  91
                 92
                 93

In the composite key approach I save a copy of "male" and "female" 9 times, as opposed to a single copy in the dictionary of dictionaries. In fact, I saved 54 items vs 26 items, getting twice the memory footprint. The example also helps to visualize the difference, see how much "empty" space there is in the second sample compared to the first, those are all values that we didn't need to save.

And for those which are still not convinced, here's a sample test:

    Dictionary<Tuple<int, int, int>, int> map1 = new Dictionary<Tuple<int, int, int>, int>();
    Dictionary<int, Dictionary<int, Dictionary<int, int>>> map2 = new Dictionary<int, Dictionary<int, Dictionary<int, int>>>();

    public void SizeTest()
    {
        for (int x = 0; x < 30; x++)
        {
            for (int y = 0; y < 100; y++)
            {
                for (int z = 0; z < 600; z++)
                {
                    addToMap1(x, y, z, 0);
                    addToMap2(x, y, z, 0);
                }
            }
        }
        int size1 = GetObjectSize(map1);
        int size2 = GetObjectSize(map2);

        Console.WriteLine(size1);
        Console.WriteLine(size2);
    }

    private void addToMap1(int x, int y, int z, int value)
    {
        map1.Add(new Tuple<int, int, int>(x, y, z), value);
    }

    private void addToMap2(int x, int y, int z, int value)
    {
        map2.GetOrAdd(x, _ => new Dictionary<int, Dictionary<int, int>>())
            .GetOrAdd(y, _ => new Dictionary<int, int>())
            .GetOrAdd(z, _ => value);
    }

    private int GetObjectSize(object TestObject)
    {
        BinaryFormatter bf = new BinaryFormatter();
        MemoryStream ms = new MemoryStream();
        byte[] Array;
        bf.Serialize(ms, TestObject);
        Array = ms.ToArray();
        return Array.Length;
    }

    public static TResult GetOrAdd<TKey, TResult>(this Dictionary<TKey, TResult> map, TKey key, Func<TKey, TResult> addIfMissing)
    {
        TResult result;
        if (!map.TryGetValue(key, out result))
        {
            result = addIfMissing(key);
            map[key] = result;
        }
        return result;
    }

This test returns ~30MB vs ~70MB in favor of the dictionary of dictionaries.

Hands answered 21/7, 2015 at 19:32 Comment(1)
This is a useful comparison, but I wanted to point out that this can depend on how sparse the data you are storing is. In the scenario that I'm looking at, 99% of the keys are going to be unique. Only a very small number of the keys for a given x are going to have more than one value for y. In this case the overhead of the extra dictionary outweighs the extra storage used by duplicate keys.Eyre
B
4

All of the options you have described are fairly similar - as for performance, you would need to test each for your specific usage scenarios, but for small collections they are unlikely to have much of a difference.

They also all suffer from readability - it is difficult to construct them and to tease out meaning out of the types.

Instead, it is better to create a type that directly describes the data - good naming goes a long way.

Bed answered 10/8, 2012 at 20:47 Comment(2)
Many years on. What do you think about my answer... seems to me that should perform well, and as you suggest provides a type describing the data and the usage and the ability to add a useful name. Wonder if any downsides to this approach?Olivenite
Yeah, records are a very light weight option - both in syntax and performance. Definitely a better option these daysBed
O
2

What about a different (newer) option

Readonly Record Struct Based Key

Similar to the tuple idea, but using a record struct e.g.

readonly record struct CharAndIntKey(char CharId, int intId);

var myDict = new Dictionary<CharAndIntKey, MyClass>();

My understanding is that this will give fast lookup (due to compiler synthesized GetHashCode() and Equals(), and avoid some of the other issues discussed.

Olivenite answered 1/4 at 16:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.