Why is the use of `Span<T>` more expensive than plain array?
Asked Answered
S

0

7

I have a very complex mathematical algorithm that's been in my code for years. Today, I did some high-level performance tests and found that this algorithm is eating a significant bit of the CPU time. I then saw that (due to its age) the algorithm uses double[] array = new double[6] a few times, and then passes the array around to sub-functions. I thought that the use of Span<double> and stackalloc double should improve performance a bit, as it reduces GC load and possibly some range checks.

I was very surprised that the opposite was the case, and the change caused the overall test time to raise from 23 seconds to 28 seconds. I wrote the following test cases that exhibit the behavior (the actual code is much more complicated):

        // This one must be quite big to see the timing difference for these simple test cases
        private const int MeasuremenLoopCount = 1000000;
        private const int ArraySize = 10;

        [Fact]
        public void UseArray()
        {
            for (int i = 0; i < MeasuremenLoopCount; i++)
            {
                double[] array = new double[ArraySize];
                array[0] = 2;

                CalcOnArray(array);
            }
        }

        private void CalcOnArray(double[] array)
        {
            for (int i = 1; i < array.Length; i++)
            {
                array[i] = array[i - 1] * 2;
            }
        }

        [Fact]
        public void UseSpanOnArray()
        {
            for (int i = 0; i < MeasuremenLoopCount; i++)
            {
                Span<double> array = new double[ArraySize];
                array[0] = 2;

                CalcOnSpan(array);
            }
        }

        private void CalcOnSpan(Span<double> array)
        {
            for (int i = 1; i < array.Length; i++)
            {
                array[i] = array[i - 1] * 2;
            }
        }

        [Fact]
        public void UseSpanOnStack()
        {
            for (int i = 0; i < MeasuremenLoopCount; i++)
            {
                StackAllocArray();
            }
        }

        // This method only exists to avoid the "use of stackalloc in a loop" error
        private void StackAllocArray()
        {
            Span<double> array = stackalloc double[ArraySize];
            array[0] = 2;

            CalcOnSpan(array);
        }

On my system,

  • UseArray runs in 46ms,
  • UseSpanOnArray runs in 57ms
  • and UseSpanOnStack runs in 55ms.

The (small) difference between the latter two is likely the improvement of using the stack over the heap, but why is UseArray significantly faster than UseSpanOnArray?

Here are the details of a profiler run, same relation between the test cases, but not much extra insight: Profiler Result

Savadove answered 24/9, 2022 at 9:12 Comment(10)
Have you tried to perform benchmarking with Benchmark.NET?Firecure
@PeterCsala I just used the Resharper test runner to measure the times of the different test cases above. Since there's not much code here, I'm not sure I can get more detailed information.Savadove
The idea of Span<T> is to avoid making copies of collections. In none of your tests are you making a copy of a collection - and because using indexing in a Span<T> is performing an extra level of indirection compared to accessing an array directly, of course it will be slower.Vanadinite
@MatthewWatson, "Span<T> is to avoid making copies of collections" Can you elaborate on that? How is the purpose of Span<T> avoiding making copies of collections? If for example a type only provides some method Foo(int[] bar) and you want to pass a sub-section of an existing array, how does Span<T> help you here? Of course the type could offer a Foo(Span<int> bar) as well. But for the same reason it could also offer Foo(int[] bar, int offset, int cout), like many types in the BCL already do...Felisha
@Savadove You could get information about memory management as well if you would use the MemoryDiagnoser.Firecure
@PeterCsala I don't care much about the memory usage at this time (it's only a few doubles). It's really the execution time that bothers me.Savadove
@MatthewWatson That could be an answer, if you can somehow source that "Span<T> is performing an extra level of indirection compared to accessing an array directly". Because that's just the opposite of what I would have expected.Savadove
@PMF: If you check the soure for the indexer of Span<T> you can see the code that is getting executed.Gorgeous
@Savadove The implementation of the Span<T> indexing operator is return ref Unsafe.Add(ref _reference, (nint)(uint)index /* force zero-extension */); (from the link Markus posted above)Vanadinite
@MySkullCaveIsADarkPlace I suppose I should have said that ONE of the purposes of Span<T> is to avoid making copies of collections - of course, it also has the purpose of avoiding GC when being copied (being a ref struct). It also allows ref access to the elements of the underlying memory block.Vanadinite

© 2022 - 2024 — McMap. All rights reserved.