Performance difference between C# for-loop and Array.Fill
Asked Answered
C

1

9

I have implemented the following benchmark using BenchmarkDotNet:

public class ForVsFillVsEnumerable
{
    private bool[] data;

    [Params(10, 100, 1000)]
    public int N;

    [GlobalSetup]
    public void Setup()
    {
        data = new bool[N];
    }

    [Benchmark]
    public void Fill()
    {
        Array.Fill(data, true);
    }

    [Benchmark]
    public void For()
    {           
        for (int i = 0; i < data.Length; i++)
        {
            data[i] = true;
        }
    }

    [Benchmark]
    public void EnumerableRepeat()
    {
        data = Enumerable.Repeat(true, N).ToArray();
    }
}

The results are:

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17763.195 (1809/October2018Update/Redstone5)
Intel Core i7-8700K CPU 3.70GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.2.200-preview-009648
  [Host] : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT
  Core   : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT

Job=Core  Runtime=Core
           Method |    N |       Mean |      Error |      StdDev |     Median | Ratio | Rank |
----------------- |----- |-----------:|-----------:|------------:|-----------:|------:|-----:|
             Fill |   10 |   3.675 ns |  0.2550 ns |   0.7150 ns |   3.331 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
              For |   10 |   6.615 ns |  0.3928 ns |   1.1581 ns |   6.056 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
 EnumerableRepeat |   10 |  25.388 ns |  1.0451 ns |   2.9307 ns |  24.170 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
             Fill |  100 |  50.557 ns |  2.0766 ns |   6.1229 ns |  46.690 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
              For |  100 |  64.330 ns |  4.0058 ns |  11.8111 ns |  59.442 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
 EnumerableRepeat |  100 |  81.784 ns |  4.2407 ns |  12.5039 ns |  75.937 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
             Fill | 1000 | 447.016 ns | 15.4420 ns |  45.5312 ns | 420.239 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
              For | 1000 | 589.243 ns | 51.3450 ns | 151.3917 ns | 495.177 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
 EnumerableRepeat | 1000 | 519.124 ns | 21.3580 ns |  62.9746 ns | 505.573 ns |  1.00 |    1 |

Originally I guessed the Array.Fill does some optimizations which make it perform better than for-loop, but then I checked the .NET Core source code to see that the Array.Fill implementation is pretty straightforward:

public static void Fill<T>(T[] array, T value)
{
    if (array == null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array);
    }

    for (int i = 0; i < array.Length; i++)
    {
        array[i] = value;
    }
}

The performance is close enough, but it stills seems Fill is consistently a bit faster then for even though under the hood it is exactly the same code. Can you explain why? Or am I just reading the results incorrectly?

Cassiani answered 8/1, 2019 at 16:53 Comment(2)
Did you compile your code in release mode?Runaway
The framework binaries have heavily optimized native images generated for them (not sure if it's "just" ngen, or some custom version). So even if you duplicate the code exactly and compile in release mode, changes are the framework version will be faster.Sapienza
S
15

I'm surprised by Enumerable.Repeat(), contrarily to my first thought it scales pretty well. Anyway, to answer your question: when you use For() you repeatedly access a class member while when calling Array.Fill() you obtain its address just once.

I'm even more surprised that compiler does not detect - and optimize - this but to read the value of a class member you need ldarg.0 to get the value of this and then ldfld ForVsFillVsEnumerable.data to obtain its actual address. In ForVsFillVsEnumerable.Fill() this is done just once to call Array.Fill().

You can check this writing your own fill function:

[Benchmark]
public void For2()
{
    ForImpl(data);
}

private static void ForImpl(bool[] data)
{
    for (int i = 0; i < data.Length; i++)
    {
        data[i] = true;
    }
}

Note 1: regardless performance, to use a library function is always better because it can potentially benefit of future optimizations (they may decide, for example, to add specific overloads for Array.Fill() and to implement them with native code where - for some architectures - a plain memset() is extremely fast).

Note 2: if loop code is so small (and fast) I'd avoid to measure anything with small vectors (10 or 100 items) because it's extremely difficult to setup a proper test environment to reliably measure a difference of few nanoseconds. I'd consider 1000 (or even 100,000) the very minimum to begin with (and even in that case so many other things will play a relevant role...) Unless your real-world use case is 10/100...in that case I'd try to measure a bigger algorithm where this difference is more evident (and if it's not then you shouldn't care).

Sevier answered 8/1, 2019 at 17:10 Comment(2)
Don't forget that there is only limited optimization applied at the IL level - it is the JIT that will do the heavy lifting (same reason as using a library function - it means you can have maximum effect by improving the JITter). For your ForImpl method, using SharpLab.io you will see that the code generated for the version using data.Length is essentially the same as for a version that stores the length in a local first. (Well, the compare gets swapped around, so a jump-if-less becomes a jump-if-greater, but that's nothing.)Sapienza
@Sapienza yes, I don't expect that moving data.Length outside the for changes generated code too much in this case (compiler knows that the length of the array does not need to be retrieved each time). The difference with the original code is about member access vs parameter. When dealing with a class field it generates the code to read that field (which will become part of the benchmark, like...using strlen() in a C program inside the loop condition) and for such a simple loop it is a significant impactSevier

© 2022 - 2024 — McMap. All rights reserved.