Why is this System.IO.Pipelines code much slower than Stream-based code?

Asked 9/10, 2020 at 16:30 Answered 21/10, 2020 at 17:10

Solved c#performance .net-core system.io.pipelines

I've written a little parsing program to compare the older System.IO.Stream and the newer System.IO.Pipelines in .NET Core. I'm expecting the pipelines code to be of equivalent speed or faster. However, it's about 40% slower.

The program is simple: it searches for a keyword in a 100Mb text file, and returns the line number of the keyword. Here is the Stream version:

public static async Task<int> GetLineNumberUsingStreamAsync(
    string file,
    string searchWord)
{
    using var fileStream = File.OpenRead(file);
    using var lines = new StreamReader(fileStream, bufferSize: 4096);

    int lineNumber = 1;
    // ReadLineAsync returns null on stream end, exiting the loop
    while (await lines.ReadLineAsync() is string line)
    {
        if (line.Contains(searchWord))
            return lineNumber;

        lineNumber++;
    }
    return -1;
}

I would expect the above stream code to be slower than the below pipelines code, because the stream code is encoding the bytes to a string in the StreamReader. The pipelines code avoids this by operating on bytes:

public static async Task<int> GetLineNumberUsingPipeAsync(string file, string searchWord)
{
    var searchBytes = Encoding.UTF8.GetBytes(searchWord);
    using var fileStream = File.OpenRead(file);
    var pipe = PipeReader.Create(fileStream, new StreamPipeReaderOptions(bufferSize: 4096));

    var lineNumber = 1;
    while (true)
    {
        var readResult = await pipe.ReadAsync().ConfigureAwait(false);
        var buffer = readResult.Buffer;

        if(TryFindBytesInBuffer(ref buffer, searchBytes, ref lineNumber))
        {
            return lineNumber;
        }

        pipe.AdvanceTo(buffer.End);

        if (readResult.IsCompleted) break;
    }

    await pipe.CompleteAsync();

    return -1;
}

Here are the associated helper methods:

/// <summary>
/// Look for `searchBytes` in `buffer`, incrementing the `lineNumber` every
/// time we find a new line.
/// </summary>
/// <returns>true if we found the searchBytes, false otherwise</returns>
static bool TryFindBytesInBuffer(
    ref ReadOnlySequence<byte> buffer,
    in ReadOnlySpan<byte> searchBytes,
    ref int lineNumber)
{
    var bufferReader = new SequenceReader<byte>(buffer);
    while (TryReadLine(ref bufferReader, out var line))
    {
        if (ContainsBytes(ref line, searchBytes))
            return true;

        lineNumber++;
    }
    return false;
}

static bool TryReadLine(
    ref SequenceReader<byte> bufferReader,
    out ReadOnlySequence<byte> line)
{
    var foundNewLine = bufferReader.TryReadTo(out line, (byte)'\n', advancePastDelimiter: true);
    if (!foundNewLine)
    {
        line = default;
        return false;
    }

    return true;
}

static bool ContainsBytes(
    ref ReadOnlySequence<byte> line,
    in ReadOnlySpan<byte> searchBytes)
{
    return new SequenceReader<byte>(line).TryReadTo(out var _, searchBytes);
}

I'm using SequenceReader<byte> above because my understanding is that it's more intelligent/faster than ReadOnlySequence<byte>; it has a fast path for when it can operate on a single Span<byte>.

Here are the benchmark results (.NET Core 3.1). Full code and BenchmarkDotNet results are available in this repo.

GetLineNumberWithStreamAsync - 435.6 ms while allocating 366.19 MB
GetLineNumberUsingPipeAsync - 619.8 ms while allocating 9.28 MB

Am I doing something wrong in the pipelines code?

Update: Evk has answered the question. After applying his fix, here are the new benchmark numbers:

GetLineNumberWithStreamAsync - 452.2 ms while allocating 366.19 MB
GetLineNumberWithPipeAsync - 203.8 ms while allocated 9.28 MB

Juror answered 9/10, 2020 at 16:30 Comment(4)

It might be instructive to take a memory dump and look at how many objects you're creating. – Chuvash 9/10, 2020 at 16:44

Thanks @IanKemp. I captured several memory dumps over repeated runs, and the Pipe code is running in constant memory. Nothing suspicious, mostly internal details of FileStream and array pooling. I tried reading upfront to a MemoryStream to avoid using FileStream, but the benchmark results are similar. – Juror 10/10, 2020 at 7:51

I guess the problem might be in search algorithm, string.Contains uses advanced search algorithm, while TryReadTo goes with simple o(n*m) solution – Overrate 21/10, 2020 at 10:11

Thanks for updating your question with the new benchmarks! – Lui 23/10, 2020 at 16:19

I believe the reason is implementaiton of SequenceReader.TryReadTo. Here is the source code of this method. It uses pretty straightforward algorithm (read to the match of first byte, then check if all subsequent bytes after that match, if not - advance 1 byte forward and repeat), and note how there are quite some methods in this implementation called "slow" (IsNextSlow, TryReadToSlow and so on), so under at least certain circumstances and in certain cases it falls back to some slow path. It also has to deal with the fact sequence might contain multiple segments, and with maintaining the position.

In your case you can avoid using SequenceReader specifically for searching the match (but leave it for actually reading lines), for example with this minor changes (this overload of TryReadTo is also more efficient in this case):

private static bool TryReadLine(ref SequenceReader<byte> bufferReader, out ReadOnlySpan<byte> line) {
    // note that both `match` and `line` are now `ReadOnlySpan` and not `ReadOnlySequence`
    var foundNewLine = bufferReader.TryReadTo(out ReadOnlySpan<byte> match, (byte) '\n', advancePastDelimiter: true);

    if (!foundNewLine) {
        line = default;
        return false;
    }

    line = match;
    return true;
}

Then:

private static bool ContainsBytes(ref ReadOnlySpan<byte> line, in ReadOnlySpan<byte> searchBytes) {
    // line is now `ReadOnlySpan` so we can use efficient `IndexOf` method
    return line.IndexOf(searchBytes) >= 0;
}

This will make your pipes code run faster than streams one.

Ludicrous answered 21/10, 2020 at 17:10 Comment(1)

Thank you! I will update my post with new benchmark numbers after applying your solution. I did not know there was an overload of ReadOnlySpan.IndexOf that could search for multiple bytes! Very handy. – Juror 23/10, 2020 at 15:45

This is perhaps not exactly the explanation you look for but I hope it give some insight:

Having a glance over the two approaches you have there, it shows in the 2nd solution is computationally more complex than the other, by having two nested loops.

Digging deeper using code profiling showcases that the 2nd one (GetLineNumberUsingPipeAsync) is almost 21.5 % more CPU intensive than the one uses the Stream (please check the screenshots, ) And it is close enough to the benchmark result I got:

Solution#1: 683.7 ms, 365.84 MB
Solution#2: 777.5 ms, 9.08 MB

Sarto answered 21/10, 2020 at 14:33 Comment(1)

Thanks for the insight! I think you're right that there was more looping going on than I realized, especially internal to some of the methods I was calling. After applying @Evk's solution the numbers have reversed and GetLineNumberWithPipeAsync is now faster. – Juror 23/10, 2020 at 15:49

Recommended topics

Hot tags