Why is OfType<> faster than Cast<>?
Asked Answered
H

4

30

In answer to the following question: How to convert MatchCollection to string array

Given The two Linq expressions:

var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
    .OfType<Match>() //OfType
    .Select(m => m.Groups[0].Value)
    .ToArray();

and

var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
    .Cast<Match>() //Cast
    .Select(m => m.Groups[0].Value)
    .ToArray();

OfType<> was benchmarked by user Alex to be slightly faster (and confirmed by myself).

This seems counterintuitive to me, as I'd have thought OfType<> would have to do both an 'is' comparison, and a cast (T).

Any enlightenment would be appreciated as to why this is the case :)

Housekeeper answered 11/7, 2012 at 10:23 Comment(10)
I don't think this is a duplicate as the question is not what the difference is, but why one is slower than the other.Slipsheet
@danbystrom In an answer to the question you linked to Ash explains OfType is slower. The OP asks why it's actually faster when all elements of the sequence are in fact of that type.Melisma
@C.Evenhuis: This is not even a real question...Mannes
Why would it do is then a cast, when it can just use as and a null check?Belongings
I'm not sure! Which of those would be faster? - The question still holds, however...Housekeeper
@Belongings I don't know. Maybe the Ash's answer is wrong. I only mention it because danbystrom linked to that question as a duplicate. Which it is not, please stop voting for close.Melisma
this is NOT a duplicate.Housekeeper
Your answer lies here Cast and OfType :yes Jon SkeetSlit
This question may be worth revisiting for C# 7 due to the is operator patterns? Though I suppose that depends on whether the methods have been kept up to date.Hemidemisemiquaver
@Slit that link is broken. This is the new URL: codeblog.jonskeet.uk/2011/01/13/…Roofdeck
O
16

My benchmarking does not agree with your benchmarking.

I ran an identical benchmark to Alex's and got the opposite result. I then tweaked the benchmark somewhat and again observed Cast being faster than OfType.

There's not much in it, but I believe that Cast does have the edge, as it should because its iterator is simpler. (No is check.)

Edit: Actually after some further tweaking I managed to get Cast to be 50x faster than OfType.

Below is the code of the benchmark that gives the biggest discrepancy I've found so far:

Stopwatch sw1 = new Stopwatch();
Stopwatch sw2 = new Stopwatch();

var ma = Enumerable.Range(1, 100000).Select(i => i.ToString()).ToArray();

var x = ma.OfType<string>().ToArray();
var y = ma.Cast<string>().ToArray();

for (int i = 0; i < 1000; i++)
{
    if (i%2 == 0)
    {
        sw1.Start();
        var arr = ma.OfType<string>().ToArray();
        sw1.Stop();
        sw2.Start();
        var arr2 = ma.Cast<string>().ToArray();
        sw2.Stop();
    }
    else
    {
        sw2.Start();
        var arr2 = ma.Cast<string>().ToArray();
        sw2.Stop();
        sw1.Start();
        var arr = ma.OfType<string>().ToArray();
        sw1.Stop();
    }
}
Console.WriteLine("OfType: " + sw1.ElapsedMilliseconds.ToString());
Console.WriteLine("Cast: " + sw2.ElapsedMilliseconds.ToString());
Console.ReadLine();

Tweaks I've made:

  • Perform the "generate a list of strings" work once, at the start, and "crystallize" it.
  • Perform one of each operation before starting timing - I'm not sure if this is necessary but I think it means the JITter generates code beforehand rather than while we're timing?
  • Perform each operation multiple times, not just once.
  • Alternate the order in case this makes a difference.

On my machine this results in ~350ms for Cast and ~18000ms for OfType.

I think the biggest difference is that we're no longer timing how long MatchCollection takes to find the next match. (Or, in my code, how long int.ToString() takes.) This drastically reduces the signal-to-noise ratio.

Edit: As sixlettervariables pointed out, the reason for this massive difference is that Cast will short-circuit and not bother casting individual items if it can cast the whole IEnumerable. When I switched from using Regex.Matches to an array to avoid measuring the regex processing time, I also switched to using something castable to IEnumerable<string> and thus activated this short-circuiting. When I altered my benchmark to disable this short-circuiting, I get a slight advantage to Cast rather than a massive one.

Orthopedic answered 11/7, 2012 at 12:4 Comment(7)
@Alex Done. I don't think there are any gaping holes in my benchmark, but feel free to point them out if there are.Orthopedic
I start to believe the odd result I've seen is caused by the regex... I'm fiddling with some more and my results are now consistent with yoursDandridge
@Rawling: Cast is faster in this case because it simply returns the enumeration unmolested because it implements IEnumerable<string>. Cast special cases.Corelation
@six ...buh. Of course. Now I remember why we were using Regex.Matches in the first place... I'll see if I can come up with a benchmark that will give me a useful result while still cutting out the work the regex is doing.Orthopedic
@Rawling: I've done so in my answer. There is no appreciable difference after random trials.Corelation
@six I've updated my answer to acknowledge and credit you for pointing this out. I still get a slight advantage to Cast but of course nothing like I was seeing before.Orthopedic
@Orthopedic - maybe .ToArray() is dominating the time measurement, as that requires allocating and filling a large array. Actually, since there is no way for it to predict the array size needed [it only knows that it has an IEnumerable, so no size available], it probably reallocates and copies the array numerous times as it grows. Using the IEnumerable results directly might show a greater difference.Dopey
I
11

OfType() should be slower since doing safe type is check before an actual explicit cast operation, in the same time Cast() doing only explicit cast.

Theoretically OfType woudl be faster in case of many elements with "wrong type", so loop enumerates further just after is check, in case of Cast() on the same collection you would end's up with an InvalidCastException on each element of "wrong type" so this would be relatively slower.

Source code extracted using ILSpy:

// System.Linq.Enumerable
private static IEnumerable<TResult> OfType<TResult>(IEnumerable source)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }

    foreach (object current in source)
    {
        // **Type check**
        if (current is TResult)
        {
            // **Explicit cast**
            yield return (TResult)current;
        }
    }
    yield break;
}

// System.Linq.Enumerable
public static IEnumerable<TResult> Cast<TResult>(this IEnumerable source)
{
    IEnumerable<TResult> enumerable = source as IEnumerable<TResult>;
    if (enumerable != null)
    {
        return enumerable;
    }
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }

    foreach (object current in source)
    {
        // **Explicit cast only**
        yield return (TResult)current;
    }
    yield break;
}
Iorgo answered 11/7, 2012 at 12:12 Comment(0)
C
7

Just reverse the order of OfType and Cast in your method and you'll note that there is no difference. The first one always runs faster than the second one. This is a case of a bad microbenchmark.

Wrapping your code in a loop to run them in random order:

OfType: 1224
Cast: 2815
Cast: 2961
OfType: 3010
OfType: 3027
Cast: 2987
...

And then again:

Cast: 1207
OfType: 2781
Cast: 2930
OfType: 2964
OfType: 2964
OfType: 2987
...

Lifting out the Regex.Matches, which appears to cause the problem:

Cast: 1247
OfType: 210
OfType: 170
Cast: 171
...

and

OfType: 1225
Cast: 202
OfType: 171
Cast: 192
Cast: 415

So, no. OfType is not faster than Cast. And no, Cast is not faster than OfType.

Corelation answered 11/7, 2012 at 13:17 Comment(1)
Cast is MUCH, MUCH faster than OfType. see: https://mcmap.net/q/110525/-why-is-oftype-lt-gt-faster-than-cast-lt-gtSaloma
E
1

Actually isof() first checks the type and then casts it, where as cast() just does the 2nd part. So obviously isof() will be slower than direct casting

http://codenets.blogspot.in/2010/06/cast-vs-oftype.html

Endorsement answered 11/7, 2012 at 12:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.