I have a very complex mathematical algorithm that's been in my code for years. Today, I did some high-level performance tests and found that this algorithm is eating a significant bit of the CPU time. I then saw that (due to its age) the algorithm uses double[] array = new double[6]
a few times, and then passes the array around to sub-functions. I thought that the use of Span<double>
and stackalloc double
should improve performance a bit, as it reduces GC load and possibly some range checks.
I was very surprised that the opposite was the case, and the change caused the overall test time to raise from 23 seconds to 28 seconds. I wrote the following test cases that exhibit the behavior (the actual code is much more complicated):
// This one must be quite big to see the timing difference for these simple test cases
private const int MeasuremenLoopCount = 1000000;
private const int ArraySize = 10;
[Fact]
public void UseArray()
{
for (int i = 0; i < MeasuremenLoopCount; i++)
{
double[] array = new double[ArraySize];
array[0] = 2;
CalcOnArray(array);
}
}
private void CalcOnArray(double[] array)
{
for (int i = 1; i < array.Length; i++)
{
array[i] = array[i - 1] * 2;
}
}
[Fact]
public void UseSpanOnArray()
{
for (int i = 0; i < MeasuremenLoopCount; i++)
{
Span<double> array = new double[ArraySize];
array[0] = 2;
CalcOnSpan(array);
}
}
private void CalcOnSpan(Span<double> array)
{
for (int i = 1; i < array.Length; i++)
{
array[i] = array[i - 1] * 2;
}
}
[Fact]
public void UseSpanOnStack()
{
for (int i = 0; i < MeasuremenLoopCount; i++)
{
StackAllocArray();
}
}
// This method only exists to avoid the "use of stackalloc in a loop" error
private void StackAllocArray()
{
Span<double> array = stackalloc double[ArraySize];
array[0] = 2;
CalcOnSpan(array);
}
On my system,
UseArray
runs in 46ms,UseSpanOnArray
runs in 57ms- and
UseSpanOnStack
runs in 55ms.
The (small) difference between the latter two is likely the improvement of using the stack over the heap, but why is UseArray
significantly faster than UseSpanOnArray
?
Here are the details of a profiler run, same relation between the test cases, but not much extra insight:
Span<T>
is to avoid making copies of collections. In none of your tests are you making a copy of a collection - and because using indexing in aSpan<T>
is performing an extra level of indirection compared to accessing an array directly, of course it will be slower. – VanadiniteFoo(int[] bar)
and you want to pass a sub-section of an existing array, how does Span<T> help you here? Of course the type could offer aFoo(Span<int> bar)
as well. But for the same reason it could also offerFoo(int[] bar, int offset, int cout)
, like many types in the BCL already do... – FelishaMemoryDiagnoser
. – FirecureSpan<T>
is performing an extra level of indirection compared to accessing an array directly". Because that's just the opposite of what I would have expected. – SavadoveSpan<T>
you can see the code that is getting executed. – GorgeousSpan<T>
indexing operator isreturn ref Unsafe.Add(ref _reference, (nint)(uint)index /* force zero-extension */);
(from the link Markus posted above) – VanadiniteSpan<T>
is to avoid making copies of collections - of course, it also has the purpose of avoiding GC when being copied (being aref struct
). It also allowsref
access to the elements of the underlying memory block. – Vanadinite