Most of the answers did not pass my test* (see below of my code) on 16 Mbyte random binary streams.
Only bruno conde's answer did work, but was a bit slower than my code (~1300 ms vs ~600 ms).
The solution code
static long FindPosition(Stream stream, byte[] pattern)
{
long foundPosition = -1;
int i = 0;
int b;
while ((b = stream.ReadByte()) > -1)
{
if (pattern[i++] != b)
{
stream.Position -= i - 1;
i = 0;
continue;
}
if (i == pattern.Length)
{
foundPosition = stream.Position - i;
break;
}
}
return foundPosition;
}
Description
Basically, it iterates the stream
and the pattern
array in parallel byte by byte.
As soon as the current stream byte does not match with the current pattern byte, the stream position will be set to the position where the current comparison started, but one byte further. And the pattern's "position" (i
) is reset to 0, of course.
If i
reaches the end of the pattern, all the stream bytes before were matching the whole pattern, so the position is found and can be returned.
The benchmark tests
Test file preparation
This is the code I used to create the test file:
private static void CreateRandomBinaryFile(string fileName, int sizeInMegabyte)
{
Random rnd = new();
byte[] b = new byte[1024];
int rounds = sizeInMegabyte * 1024;
using FileStream fs = File.Create(fileName);
for (int i = 0; i < rounds; i++)
{
rnd.NextBytes(b);
fs.Write(b, 0, b.Length);
}
}
CreateRandomBinaryFile("fileOne.bin", 16);
For the second test file I moved the rnd.NextBytes(b)
out of and before the for-loop and added this code after the for-loop:
b[0] = 0x12;
b[1] = 0x13;
b[2] = 0x15;
fs.Write(b, 0, b.Length);
So I will get a file where 1024 random bytes are repeated 1024 * 16 times, but only the last 1024-sequence modified.
The benchmark
public delegate long FindPositionDelegate(Stream stream, byte[] pattern);
struct Test
{
public string File;
public byte[] Needle;
}
static void Main(string[] args)
{
Dictionary<string, FindPositionDelegate> codeSoultions = new()
{
{ "This", FindPosition },
{ "Bruno", BrunoConde.FindPosition }
};
List<Test> tests = new()
{
new Test() {
File = @"fileOne.bin",
Needle = new byte[] { 0x4a, 0xaf, 0x6f, 0x34 }
},
new Test()
{
File = @"fileTwo.bin",
Needle = new byte[] { 0xfc, 0x1d, 0x40, 0x38, 0x0b, 0xe4, 0x50, 0x5b,
0x1a, 0xbf, 0x2a, 0xab, 0x9b, 0x1b, 0x0f, 0xc7, 0x12, 0x13, 0x15 }
}
};
for (int i = 0; i < 3; i++)
{
foreach (var solution in codeSoultions)
{
foreach (var test in tests)
{
using FileStream file = File.OpenRead(test.File);
Stopwatch watch = Stopwatch.StartNew();
long pos = solution.Value(file, test.Needle);
watch.Stop();
long elapsedMs = watch.ElapsedMilliseconds;
Console.WriteLine($"Iteration: {i},\t" +
$"Method: {solution.Key},\t" +
$"Result: {pos},\t" +
$"Elapsed ms: {elapsedMs}");
}
}
}
}
Test #1
This code: ~ 598 ms
Bruno code: ~ 1371 ms
Test #2
This code: ~ 1371 ms
Bruno code: ~ 1509 ms
*
dharga's code thows Exception but does work with small byte arrays.
Kind Contributor throws Exception but does work with small byte arrays.
The others, where I have commented, return wrong values.