What is it that makes Enum.HasFlag so slow?
Asked Answered
C

4

76

I was doing some speed tests and I noticed that Enum.HasFlag is about 16 times slower than using the bitwise operation.

Does anyone know the internals of Enum.HasFlag and why it is so slow? I mean twice as slow wouldn't be too bad but it makes the function unusable when its 16 times slower.

In case anyone is wondering, here is the code I am using to test its speed.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace app
{
    public class Program
    {
        [Flags]
        public enum Test
        {
            Flag1 = 1,
            Flag2 = 2,
            Flag3 = 4,
            Flag4 = 8
        }
        static int num = 0;
        static Random rand;
        static void Main(string[] args)
        {
            int seed = (int)DateTime.UtcNow.Ticks;

            var st1 = new SpeedTest(delegate
            {
                Test t = Test.Flag1;
                t |= (Test)rand.Next(1, 9);
                if (t.HasFlag(Test.Flag4))
                    num++;
            });

            var st2 = new SpeedTest(delegate
            {
                Test t = Test.Flag1;
                t |= (Test)rand.Next(1, 9);
                if (HasFlag(t , Test.Flag4))
                    num++;
            });

            rand = new Random(seed);
            st1.Test();
            rand = new Random(seed);
            st2.Test();

            Console.WriteLine("Random to prevent optimizing out things {0}", num);
            Console.WriteLine("HasFlag: {0}ms {1}ms {2}ms", st1.Min, st1.Average, st1.Max);
            Console.WriteLine("Bitwise: {0}ms {1}ms {2}ms", st2.Min, st2.Average, st2.Max);
            Console.ReadLine();
        }
        static bool HasFlag(Test flags, Test flag)
        {
            return (flags & flag) != 0;
        }
    }
    [DebuggerDisplay("Average = {Average}")]
    class SpeedTest
    {
        public int Iterations { get; set; }

        public int Times { get; set; }

        public List<Stopwatch> Watches { get; set; }

        public Action Function { get; set; }

        public long Min { get { return Watches.Min(s => s.ElapsedMilliseconds); } }

        public long Max { get { return Watches.Max(s => s.ElapsedMilliseconds); } }

        public double Average { get { return Watches.Average(s => s.ElapsedMilliseconds); } }

        public SpeedTest(Action func)
        {
            Times = 10;
            Iterations = 100000;
            Function = func;
            Watches = new List<Stopwatch>();
        }

        public void Test()
        {
            Watches.Clear();
            for (int i = 0; i < Times; i++)
            {
                var sw = Stopwatch.StartNew();
                for (int o = 0; o < Iterations; o++)
                {
                    Function();
                }
                sw.Stop();
                Watches.Add(sw);
            }
        }
    }
}

Results:

HasFlag: 52ms 53.6ms 55ms
Bitwise: 3ms 3ms 3ms
Cute answered 10/9, 2011 at 0:0 Comment(5)
Because an enumerated type can have a different underlying base type. Enum.HasValue cannot make any assumptions about that base type, it has to assume the worst. Which involves working with UInt64 and boxed values. Your HashType function is type safe.Dialogist
You might also want to see this: why-enums-hasflag-method-need-boxingOakley
I just benchmarked this with .NET 4.6: HasFlag: 8ms 8,7ms 11ms, Bitwise: 4ms 4ms 4ms. So it seems they have improved the implementation.Lament
Benchmarked in .NET Fiddle with 4.7.2 HasFlag: 8ms 9.4ms 16ms Bitwise: 5ms 5ms 5ms .NET Core 2.2 is even worse: HasFlag: 17ms 22.7ms 26ms Bitwise: 9ms 10.2ms 15msBias
dotnetfiddle.net/5RRmcu - with equality, 1msHyperkeratosis
B
85

Does anyone know the internals of Enum.HasFlag and why it is so slow?

The actual check is just a simple bit check in Enum.HasFlag - it's not the problem here. That being said, it is slower than your own bit check...

There are a couple of reasons for this slowdown:

First, Enum.HasFlag does an explicit check to make sure that the type of the enum and the type of the flag are both the same type, and from the same Enum. There is some cost in this check.

Secondly, there is an unfortunate box and unbox of the value during a conversion to UInt64 that occurs inside of HasFlag. This is, I believe, due to the requirement that Enum.HasFlag work with all enums, regardless of the underlying storage type.

That being said, there is a huge advantage to Enum.HasFlag - it's reliable, clean, and makes the code very obvious and expressive. For the most part, I feel that this makes it worth the cost - but if you're using this in a very performance critical loop, it may be worth doing your own check.

Bultman answered 10/9, 2011 at 0:5 Comment(6)
It's possible, and not overly difficult, to write a static generic method which will take two arguments of the same type and, if that type derives from enum, perform the same test as Enum.HasFlag; such a method can run 30x as fast as Enum.HasFlag. With a little CIL tweaking, one could make an extension method which would pop up in the IDE with types derived from System.Enum but not with other types. I wonder why Microsoft bothered to write HasFlag, but didn't make it even remotely performant?Roentgenoscope
This is absolutely HORRENDOUS! I just profiled a large app on a dataset that creates millions of objects and does a great deal of algorithmic processing that does little with enums. The #1 hot function? Enum.HasFlag!! I had always assumed this would amount to a single inlined bitwise test in a Release build! If I had responsibility for this area at MS, I wouldn't be able to sleep at night until this was fixed.Philippians
@KenBeckett It's partly, I think, MS's huge C#==.NET focus - since C# doesn't allow generic constraints on enum, you can't write this there without boxing, which causes this to be problematic.Bultman
Mono 4 transforms a.HasFlag(b) where a and b are the exact same type, to an actual bitwise AND and skip all the heavy reflection normally done in the method - or so they say.Frohne
@ReedCopsey which raises the question of why C# can't constrain a generic type parameter to be "any enum" (while allegedly, C++/CLI can).Lauren
Enum.HasFlag is now optimized by the JIT in .NET Core.Earthling
M
32

Decompiled code of Enum.HasFlags() looks like this:

public bool HasFlag(Enum flag)
{
    if (!base.GetType().IsEquivalentTo(flag.GetType()))
    {
        throw new ArgumentException(Environment.GetResourceString("Argument_EnumTypeDoesNotMatch", new object[] { flag.GetType(), base.GetType() }));
    }
    ulong num = ToUInt64(flag.GetValue());
    return ((ToUInt64(this.GetValue()) & num) == num);
}

If I were to guess, I would say that checking the type was what's slowing it down most.

Note that in recent versions of .Net Core, this has been improved and Enum.HasFlag compiles to the same code as using bitwise comparisons.

Marchall answered 10/9, 2011 at 0:6 Comment(3)
I suspect, though I'd have to profile, that the ToUInt64(xxx.GetValue()) is actually the worst part, as it does a box/unbox + Convert.ToUInt64...Bultman
Thanks for the code snippet. For some reason .net reflector is not showing the code for .net 4 assemblies.Cute
@ReedCopsey - In .net 4.0 the implementation is different and uses an extern method insteadHelbonnas
M
5

The performance penalty due to boxing discussed on this page also affects the public .NET functions Enum.GetValues and Enum.GetNames, which both forward to (Runtime)Type.GetEnumValues and (Runtime)Type.GetEnumNames respectively.

All of these functions use a (non-generic) Array as a return type--which is not so bad for the names (since String is a reference type)--but is quite inappropriate for the ulong[] values.

Here's a peek at the offending code (.NET 4.7):

public override Array /* RuntimeType.*/ GetEnumValues()
{
    if (!this.IsEnum)
        throw new ArgumentException();

    ulong[] values = Enum.InternalGetValues(this);
    Array array = Array.UnsafeCreateInstance(this, values.Length);
    for (int i = 0; i < values.Length; i++)
    {
        var obj = Enum.ToObject(this, values[i]);   // ew. boxing.
        array.SetValue(obj, i);                     // yuck
    }
    return array;              // Array of object references, bleh.
}

We can see that prior to doing the copy, RuntimeType goes back again to System.Enum to get an internal array, a singleton which is cached, on demand, for each specific Enum. Notice also that this version of the values array does use the proper strong signature, ulong[].

Here's the .NET function (again we're back in System.Enum now). There's a similar function for getting the names (not shown).

internal static ulong[] InternalGetValues(RuntimeType enumType) => 
    GetCachedValuesAndNames(enumType, false).Values;

See the return type? This looks like a function we'd like to use... But first consider that a second reason that .NET re-copys the array each time (as you saw above) is that .NET must ensure that each caller gets an unaltered copy of the original data, given that a malevolent coder could change her copy of the returned Array, introducing a persistent corruption. Thus, the re-copying precaution is especially intended to protect the cached internal master copy.

If you aren't worried about that risk, perhaps because you feel confident you won't accidentally change the array, or maybe just to eke-out a few cycles of (what's surely premature) optimization, it's simple to fetch the internal cached array copy of the names or values for any Enum:

        → The following two functions comprise the sum contribution of this article ←
        → (but see edit below for improved version) ←

static ulong[] GetEnumValues<T>() where T : struct =>
        (ulong[])typeof(System.Enum)
            .GetMethod("InternalGetValues", BindingFlags.Static | BindingFlags.NonPublic)
            .Invoke(null, new[] { typeof(T) });

static String[] GetEnumNames<T>() where T : struct =>
        (String[])typeof(System.Enum)
            .GetMethod("InternalGetNames", BindingFlags.Static | BindingFlags.NonPublic)
            .Invoke(null, new[] { typeof(T) });

Note that the generic constraint on T isn't fully sufficient for guaranteeing Enum. For simplicity, I left off checking any further beyond struct, but you might want to improve on that. Also for simplicity, this (ref-fetches and) reflects directly off the MethodInfo every time rather than trying to build and cache a Delegate. The reason for this is that creating the proper delegate with a first argument of non-public type RuntimeType is tedious. A bit more on this below.

First, I'll wrap up with usage examples:

var values = GetEnumValues<DayOfWeek>();
var names = GetEnumNames<DayOfWeek>();

and debugger results:

'values'    ulong[7]
[0] 0
[1] 1
[2] 2
[3] 3
[4] 4
[5] 5
[6] 6

'names' string[7]
[0] "Sunday"
[1] "Monday"
[2] "Tuesday"
[3] "Wednesday"
[4] "Thursday"
[5] "Friday"
[6] "Saturday"

So I mentioned that the "first argument" of Func<RuntimeType,ulong[]> is annoying to reflect over. However, because this "problem" arg happens to be first, there's a cute workaround where you can bind each specific Enum type as a Target of its own delegate, where each is then reduced to Func<ulong[]>.)

Clearly, its pointless to make any of those delegates, since each would just be a function that always return the same value... but the same logic seems to apply, perhaps less obviously, to the original situation as well (i.e., Func<RuntimeType,ulong[]>). Although we do get by with a just one delegate here, you'd never really want to call it more than once per Enum type. Anyway, all of this leads to a much better solution, which is included in the edit below.


[edit:]
Here's a slightly more elegant version of the same thing. If you will be calling the functions repeatedly for the same Enum type, the version shown here will only use reflection one time per Enum type. It saves the results in a locally-accessible cache for extremely rapid access subsequently.

static class enum_info_cache<T> where T : struct
{
    static _enum_info_cache()
    {
        values = (ulong[])typeof(System.Enum)
            .GetMethod("InternalGetValues", BindingFlags.Static | BindingFlags.NonPublic)
            .Invoke(null, new[] { typeof(T) });

        names = (String[])typeof(System.Enum)
            .GetMethod("InternalGetNames", BindingFlags.Static | BindingFlags.NonPublic)
            .Invoke(null, new[] { typeof(T) });
    }
    public static readonly ulong[] values;
    public static readonly String[] names;
};

The two functions become trivial:

static ulong[] GetEnumValues<T>() where T : struct => enum_info_cache<T>.values;
static String[] GetEnumNames<T>() where T : struct => enum_info_cache<T>.names;

The code shown here illustrates a pattern of combining three specific tricks that seem to mutually result in an unusualy elegant lazy caching scheme. I've found the particular technique to have surprisingly wide application.

  1. using a generic static class to cache independent copies of the arrays for each distinct Enum. Notably, this happens automatically and on demand;

  2. related to this, the loader lock guarantees unique atomic initialization and does this without the clutter of conditional checking constructs. We can also protect static fields with readonly (which, for obvious reasons, typically can't be used with other lazy/deferred/demand methods);

  3. finally, we can capitalize on C# type inference to automatically map the generic function (entry point) into its respective generic static class, so that the demand caching is ultimately even driven implicitly (viz., the best code is the code that isn't there--since it can never have bugs)

You probably noticed that the particular example shown here doesn't really illustrate point (3) very well. Rather than relying on type inference, the void-taking function has to manually propagate forward the type argument T. I didn't choose to expose these simple functions such that there would be an opportunity to show how C# type inference makes the overall technique shine...

However, you can imagine that when you do combine a static generic function that can infer its type argument(s)--i.e., so you don't even have to provide them at the call site--then it gets quite powerful.

The key insight is that, while generic functions have the full type-inference capability, generic classes do not, that is, the compiler will never infer T if you try to call the first of the following lines. But we can still get fully inferred access to a generic class, and all the benefits that entails, by traversing into them via generic function implicit typing (last line):

int t = 4;
typed_cache<int>.MyTypedCachedFunc(t);  // no inference from 't', explicit type required

MyTypedCacheFunc<int>(t);               // ok, (but redundant)

MyTypedCacheFunc(t);                    // ok, full inference

Designed well, inferred typing can effortlessly launch you into the appropriate automatically demand-cached data and behaviors, customized for each type (recall points 1. and 2). As noted, I find the approach useful, especially considering its simplicity.

Mezzotint answered 2/8, 2017 at 23:46 Comment(1)
I think I broke my brain while trying to understand that all... And learned a few new english words due having to use a dictionary. Heh! But basically the enum stuff boils down to this: don't use HasFlag (or any such) when speed's critical - go bit-ops?Dour
K
3

The JITter ought to be inlining this as a simple bitwise operation. The JITter is aware enough to custom-handle even certain framework methods (via MethodImplOptions.InternalCall I think?) but HasFlag seems to have escaped Microsoft's serious attention.

Keratitis answered 31/12, 2012 at 8:0 Comment(6)
The way the function is written, the JITter can't optimize it. If Enum1 and Enum2 are enumerated types, the code Enum1.HasFlag(Enum2) is required to create new heap-object instances which hold the values that were held by Enum1 and Enum2, and then pass those objects to some routines that are way too big to be analyzed. The JITter really has no choice but to produce code that creates those heap objects, and the creation of those objects is what totally dogs performance.Roentgenoscope
If the JITter knows that Enum1 and Enum2 have the same underlying type, it can ditch the box instruction and inline that particular call. There's no reason the boxing operation has to be done every time.Keratitis
There's no way to avoid boxing without a generic HasFlag method. The problem is that if Enum1 and Enum2 are the same concrete enum type, but Enum3 is System.Enum, Enum1.HasFlag(Enum2) must call the same JITted code as Enum1.HasFlag(Enum3). If a non-generic method overload can accept a heap reference to a boxed value type, there's no way that overload can accept anything other than a heap reference, nor is there any way for a value type to be implicitly converted to a heap reference of a compatible type except via boxing.Roentgenoscope
If the JITter doesn't know, then the compiler should. One or the other has the information it needs to optimize the Enum1.HasFlag(Enum2) special case. The boxing operation can be built separately when boxing is actually being used.Keratitis
I wish that instead of just having one System.Enum there had been a family of System.Int32Enum, System.Int64Enum, etc. each of which had a suitable Value member, or else that the type system had allowed types to be derived from value types provided the derived type didn't add any new fields (in which case enum types could derive from Int32, Int64, etc. as appropriate). As it is, though, the generated code for enum1.HasFlag(enum2) needs to vary depending upon the type in question, and there's no way to do that without either...Roentgenoscope
...explicitly coding the method for each enum type, or else dispatching through a delegate. Personally, what I'd most like would be to have the result of the & operator be a compiler-temporary type which in the absence of type coercion would behave as it does now, but which could if needed be implicitly cast to bool or to integer type appropriate to the smaller unsigned operand (IMHO, in the statement if ((someEnum & EnumModes.FancyMode) != 0) doSomething();, requiring the != 0 part makes the code less clear than it would be in its absence).Roentgenoscope

© 2022 - 2024 — McMap. All rights reserved.