Why allocation on ArrayPool is faster then allocation on Stack?
Asked Answered
C

1

9

I have following benchmark which read string from file using Stack allocation, Heap allocation and ArrayPool allocation.

I would expect that Stack allocation is fastest, because it is just stack pointer increment, but according to benchmark ArrayPool is slightly faster.

How it is possible?

static void Main(string[] args)
{         
     BenchmarkRunner.Run<BenchmarkRead>();          
}

using BenchmarkDotNet.Attributes;
using System;
using System.Buffers;
using System.IO;
using System.Linq;

namespace RealTime.Benchmark
{
[MemoryDiagnoser]
public class BenchmarkRead
{
    const string TestFile = "TestFiles/animals.txt";

    public BenchmarkRead()
    {
        Directory.CreateDirectory(Path.GetDirectoryName(TestFile));
        // cca 100 KB of text
        string content = string.Concat(Enumerable.Repeat("dog,cat,spider,cat,bird,", 4000));
        File.WriteAllText(TestFile, content);
    }

    [Benchmark]
    public void ReadFileOnPool() => ReadFileOnPool(TestFile);

    [Benchmark]
    public void ReadFileOnHeap() => ReadFileOnHeap(TestFile);

    [Benchmark]
    public void ReadFileOnStack() => ReadFileOnStack(TestFile);

    public void ReadFileOnHeap(string filename)
    {
        string text = File.ReadAllText(filename);
        // ....call parse
    }

    public void ReadFileOnStack(string filename)
    {
        Span<byte> span = stackalloc byte[1024 * 200];
        using (var stream = File.OpenRead(filename))
        {
            int count = stream.Read(span);
            if (count == span.Length)
                throw new Exception($"Buffer size {span.Length} too small, use array pooling.");
            span = span.Slice(0, count);
            // ....call parse
        }
    }

    public void ReadFileOnPool(string filename)
    {
        ArrayPool<byte> pool = ArrayPool<byte>.Shared;
        using (var stream = File.OpenRead(filename))
        {
            long len = stream.Length;
            byte[] buffer = pool.Rent((int)len);
            try
            {
                int count = stream.Read(buffer, 0, (int)len);
                if (count != len)
                    throw new Exception($"{count} != {len}");

                Span<byte> span = new Span<byte>(buffer).Slice(0, count);
                // ....call parse
            }
            finally
            {
                pool.Return(buffer);
            }
        }
    }
}
}

Results:

|          Method |     Mean | Gen 0/1k Op | Gen 2/1k Op |Al. memory/Op|
|---------------- |---------:|------------:|------------:|------------:|
|  ReadFileOnPool | 109.9 us |      0.1221 |           - |       480 B |
|  ReadFileOnHeap | 506.0 us |     87.8906 |     58.5938 |    393440 B |
| ReadFileOnStack | 115.2 us |      0.1221 |           - |       480 B |
Crapulous answered 18/3, 2019 at 20:25 Comment(4)
Is there actually problem here? Simply comparing things doesn't really help any future users because it's only specific to your performance test. With a real world example it's much easier to generalize performance answers.Darwinism
I wouldn't assume Stack is fastest here because you are definitely allocating memory there whereas the Pool method may have an already-allocated block available.Assay
I guess we'd have to dig into the generated bytecodeHyponitrite
because it is just stack pointer increment, I suspect the allocation of memory and the actual file IO is a larger proportion of the time spent, which is consistent with what you are seeing.Doha
P
14
  • Span<byte> span = stackalloc byte[1024 * 200] will be zero-initialized due to InitLocals.

  • byte[] buffer = pool.Rent((int)len); will not be zero-initialized at all.

So you have reached the point where the cost of zero initializing your local array is more expensive than the whole Rent() routine.

I actually created a nuget package exatly for this a few months ago https://github.com/josetr/InitLocals but we'll soon have something similar from Microsoft as well: https://github.com/dotnet/corefx/issues/29026.

Pazice answered 16/11, 2019 at 18:51 Comment(2)
The attribute SkipLocalsInit is now available for .NET 5+. This disables the .locals init flag and prevents zero initialization.Glair
I am wondering if this attribute would allow to corrupt reference types? Or it would still zero initialize references?Cassiecassil

© 2022 - 2024 — McMap. All rights reserved.