I have a file that's 21056 bytes.
I've written a program in C that reads the entire file into a buffer, and then uses multiple search algorithms to search the file for a token that's 82 chars.
I've used all the implementations of the algorithms from the “Exact String Matching Algorithms” page. I've used: KMP, BM, TBM, and Horspool. And then I used strstr
and benchmarked each one.
What I'm wondering is, each time the strstr
outperforms all the other algorithms. The only one that is faster sometimes is BM.
Shouldn't strstr
be the slowest?
Here's my benchmark code with an example of benchmarking BM:
double get_time()
{
LARGE_INTEGER t, f;
QueryPerformanceCounter(&t);
QueryPerformanceFrequency(&f);
return (double)t.QuadPart/(double)f.QuadPart;
}
before = get_time();
BM(token, strlen(token), buffer, len);
after = get_time();
printf("Time: %f\n\n", after - before);
Could someone explain to me why strstr
is outperforming the other search algorithms? I'll post more code on request if needed.
strstr
– KMP may actually be slower. But since you didn’t post your code, we can’t help you. That said, you can chose your data deliberately to make the naive search win so the choice of input data is also relevant. – Tristan