Unexpected performance results when comparing dictionary lookup vs multiple is operators in .NET 4.7
Asked Answered
L

1

9

I have the problem where I need to do dynamic dispatch based on an object type. The types based on which I need to dispatch are known at compile time - in my example they are 17.

My initial guess was to use a Dictionary<Type, Action<Object>> for the dispatching and to use obj.GetType() to find out the appropriate action. But then I decided to use BenchmarkDotNet to see if I can do better and exactly how expensive the dispatch lookup would be. Bellow is the code I used for the benchmark.

public class Program
{
    private static readonly Object Value = Guid.NewGuid();
    private static readonly Dictionary<Type, Action<Object>> Dictionary = new Dictionary<Type, Action<Object>>()
    {
        [ typeof( Byte ) ] = x => Empty( (Byte)x ),
        [ typeof( Byte[] ) ] = x => Empty( (Byte[])x ),
        [ typeof( SByte ) ] = x => Empty( (SByte)x ),
        [ typeof( Int16 ) ] = x => Empty( (Int16)x ),
        [ typeof( UInt16 ) ] = x => Empty( (UInt16)x ),
        [ typeof( Int32 ) ] = x => Empty( (Int32)x ),
        [ typeof( UInt32 ) ] = x => Empty( (UInt32)x ),
        [ typeof( Int64 ) ] = x => Empty( (Int64)x ),
        [ typeof( UInt64 ) ] = x => Empty( (UInt64)x ),
        [ typeof( Decimal ) ] = x => Empty( (Decimal)x ),
        [ typeof( Single ) ] = x => Empty( (Single)x ),
        [ typeof( Double ) ] = x => Empty( (Double)x ),
        [ typeof( String ) ] = x => Empty( (String)x ),
        [ typeof( DateTime ) ] = x => Empty( (DateTime)x ),
        [ typeof( TimeSpan ) ] = x => Empty( (TimeSpan)x ),
        [ typeof( Guid ) ] = x => Empty( (Guid)x ),
        [ typeof( Char ) ] = x => Empty( (Char)x ),
    };


    [Benchmark]
    public void Switch() => Switch( Value );


    [Benchmark]
    public void Lookup() => Lookup( Value );


    private static void Switch( Object value )
    {
        if ( value is Byte ) goto L_Byte;
        if ( value is SByte ) goto L_SByte;
        if ( value is Int16 ) goto L_Int16;
        if ( value is UInt16 ) goto L_UInt16;
        if ( value is Int32 ) goto L_Int32;
        if ( value is UInt32 ) goto L_UInt32;
        if ( value is Int64 ) goto L_Int64;
        if ( value is UInt64 ) goto L_UInt64;
        if ( value is Decimal ) goto L_Decimal;
        if ( value is Single ) goto L_Single;
        if ( value is Double ) goto L_Double;
        if ( value is DateTime ) goto L_DateTime;
        if ( value is TimeSpan ) goto L_TimeSpan;
        if ( value is DateTimeOffset ) goto L_DateTimeOffset;
        if ( value is String ) goto L_String;
        if ( value is Byte[] ) goto L_ByteArray;
        if ( value is Char ) goto L_Char;
        if ( value is Guid ) goto L_Guid;

        return;

        L_Byte: Empty( (Byte)value ); return;
        L_SByte: Empty( (SByte)value ); return;
        L_Int16: Empty( (Int16)value ); return;
        L_UInt16: Empty( (UInt16)value ); return;
        L_Int32: Empty( (Int32)value ); return;
        L_UInt32: Empty( (UInt32)value ); return;
        L_Int64: Empty( (Int64)value ); return;
        L_UInt64: Empty( (UInt64)value ); return;
        L_Decimal: Empty( (Decimal)value ); return;
        L_Single: Empty( (Single)value ); return;
        L_Double: Empty( (Double)value ); return;
        L_DateTime: Empty( (DateTime)value ); return;
        L_DateTimeOffset: Empty( (DateTimeOffset)value ); return;
        L_TimeSpan: Empty( (TimeSpan)value ); return;
        L_String: Empty( (String)value ); return;
        L_ByteArray: Empty( (Byte[])value ); return;
        L_Char: Empty( (Char)value ); return;
        L_Guid: Empty( (Guid)value ); return;
    }


    private static void Lookup( Object value )
    {
        if ( Dictionary.TryGetValue( value.GetType(), out var action ) )
        {
            action( value );
        }
    }


    [MethodImpl( MethodImplOptions.NoInlining )]
    private static void Empty<T>( T value ) { }


    static void Main( string[] args )
    {
        BenchmarkRunner.Run( typeof( Program ) );

        Console.ReadLine();
    }
}

In my example I ran the test with a boxed Guid which is the worst case in the handcrafted Switch function. The results were surprising to say the least:

BenchmarkDotNet=v0.10.11, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.125)
    Processor=Intel Core i7-4790K CPU 4.00GHz (Haswell), ProcessorCount=8
    Frequency=3903988 Hz, Resolution=256.1483 ns, Timer=TSC
      [Host]     : .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2600.0
      DefaultJob : .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2600.0

     Method |     Mean |     Error |    StdDev |
    ------- |---------:|----------:|----------:|
     Switch | 13.21 ns | 0.1057 ns | 0.0989 ns |
     Lookup | 28.22 ns | 0.1082 ns | 0.1012 ns |

The switch function 2 times faster for it's worst case. If I reorder the ifs so most common types are first then on average I expect it to run 3-5 times faster.

My question is how come 18 checks are so much faster than a single dictionary lookup? Am I missing something obvious?

EDIT:

The original test was x86 (Prefer 32bit) mode on x64 machine. I ran the tests in 64 release build as well:

    Method |      Mean |     Error |    StdDev |
---------- |----------:|----------:|----------:|
    Switch | 12.451 ns | 0.0600 ns | 0.0561 ns |
    Lookup | 22.552 ns | 0.1108 ns | 0.1037 ns |
Lascivious answered 28/12, 2017 at 12:39 Comment(7)
This is on release build?Gentilesse
@Gentilesse Yes. Runtime = .NET Framework 4.7 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.7.2600.0; GC = Concurrent Workstation (Prefer 32bit) on 64 bit machine. I will test in 64 bit mode as well.Lascivious
Would be interesting to see where a List<T> linearly searching for the type would rank compared to these two. I think it would give some clue as to whether the "problem" is related to the lambda or the actual dictionary lookup (which is quite a possibility). Dictionaries don't usually offer much of a speedup when you're dealing with 17 elements.Amazed
@Amazed I ran the test with List<T> and as expected it's 6 times slower than the dictionary lookup.Lascivious
@IvanZlatanov Then I suspect branch prediction as the reason for the switch being so fast. Try to give it a random type each iteration instead of the same one.Amazed
@Amazed Nope, seems not to be the case. I ran a few tests - one with for over 8 different values and one with Rand() % 8 and the overall performance difference grew in both cases - mostly because we are now hitting cases that are not worst case for the switch function.Lascivious
Profiling such very fast code is tricky, I'm getting very different measurements with an old fashioned Stopwatch. I see Lookup about 10% faster than Switch. What isn't visible in the test is where the cost goes, GetType() eats almost half of the execution time. Perhaps intuitively obvious, Type is a large object. It does get cached, equivalent to a Dictionary lookup :) But sure, the is operator does not suck, a simple comparison to the type handle in the object header. The 17 branches are predicted very well in this test since there is never a match.Tardiff
H
6

I'm by no means an IL performance guru, but if you decompile and especially look at the IL, this make sense.

The is operator is only 4 opcodes (ldarg, isinst, ldnull, cgt), and each switch part only 7 in total with the goto added in. The action part of Switch to call Empty() is then another 6, giving 17*7+6 = 125 max.

By contrast Dictionary.TryGetValue may only be one method call, but inside this it is doing a lot of work hashing, looping and comparing values:

http://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs,2e5bc6d8c0f21e67

public bool TryGetValue(TKey key, out TValue value) {
    int i = FindEntry(key);
    if (i >= 0) {
        value = entries[i].value;
        return true;
    }
    value = default(TValue);
    return false;
}

http://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs,bcd13bb775d408f1

private int FindEntry(TKey key) {
    if( key == null) {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
    }

    if (buckets != null) {
        int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;
        for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {
            if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;
        }
    }
    return -1;
}

The for loop inside FindEntry alone is 31 opcodes for each loop, giving a max of 527 opcodes just for this part.

This is very basic analysis but it is easy to see that the switch should be more performant. As is often the case tho, you need to consider performance versus readability and maintainability. If using a Dictionary lookup gives you nicer code, it is rare that the performance loss (15ns) will outweigh that benefit.

Hanyang answered 28/12, 2017 at 13:27 Comment(2)
In general that makes sense (although different opcodes do not take the same amount of cpu time). I just always thought that is operator is slow and one should consider a lookup table instead of a long chain of if then else - but it seems that it really isn't. The readability of the function is not really suffering here, and 2x-3x performance improvement is worthed.Lascivious
You're just making an assumption, which may or may not be true. You should base your answer on profiling instead, it's the only thing that'll guarantee an accurate analysis. And IL code size isn't a good performance metric.Snakemouth

© 2022 - 2024 — McMap. All rights reserved.