Why is the use of `Span<T>` more expensive than plain array?

I have a very complex mathematical algorithm that's been in my code for years. Today, I did some high-level performance tests and found that this algorithm is eating a significant bit of the CPU time. I then saw that (due to its age) the algorithm uses double[] array = new double[6] a few times, and then passes the array around to sub-functions. I thought that the use of Span<double> and stackalloc double should improve performance a bit, as it reduces GC load and possibly some range checks.

I was very surprised that the opposite was the case, and the change caused the overall test time to raise from 23 seconds to 28 seconds. I wrote the following test cases that exhibit the behavior (the actual code is much more complicated):

        // This one must be quite big to see the timing difference for these simple test cases
        private const int MeasuremenLoopCount = 1000000;
        private const int ArraySize = 10;

        [Fact]
        public void UseArray()
        {
            for (int i = 0; i < MeasuremenLoopCount; i++)
            {
                double[] array = new double[ArraySize];
                array[0] = 2;

                CalcOnArray(array);
            }
        }

        private void CalcOnArray(double[] array)
        {
            for (int i = 1; i < array.Length; i++)
            {
                array[i] = array[i - 1] * 2;
            }
        }

        [Fact]
        public void UseSpanOnArray()
        {
            for (int i = 0; i < MeasuremenLoopCount; i++)
            {
                Span<double> array = new double[ArraySize];
                array[0] = 2;

                CalcOnSpan(array);
            }
        }

        private void CalcOnSpan(Span<double> array)
        {
            for (int i = 1; i < array.Length; i++)
            {
                array[i] = array[i - 1] * 2;
            }
        }

        [Fact]
        public void UseSpanOnStack()
        {
            for (int i = 0; i < MeasuremenLoopCount; i++)
            {
                StackAllocArray();
            }
        }

        // This method only exists to avoid the "use of stackalloc in a loop" error
        private void StackAllocArray()
        {
            Span<double> array = stackalloc double[ArraySize];
            array[0] = 2;

            CalcOnSpan(array);
        }

On my system,

UseArray runs in 46ms,
UseSpanOnArray runs in 57ms
and UseSpanOnStack runs in 55ms.

The (small) difference between the latter two is likely the improvement of using the stack over the heap, but why is UseArray significantly faster than UseSpanOnArray?

Here are the details of a profiler run, same relation between the test cases, but not much extra insight:

Recommended topics

Hot tags