I have a very long string (60MB in size) in which I need to find how many pairs of '<' and '>' are in there.
I have first tried my own way:
char pre = '!';
int match1 = 0;
for (int j = 0; j < html.Length; j++)
{
char c = html[j];
if (pre == '<' && c == '>') //find a match
{
pre = '!';
match1++;
}
else if (pre == '!' && c == '<')
pre = '<';
}
The above code runs on my string for roughly 1000 ms.
Then I tried using string.IndexOf
int match2 = 0;
int index = -1;
do
{
index = html.IndexOf('<', index + 1);
if (index != -1) // find a match
{
index = html.IndexOf('>', index + 1);
if (index != -1)
match2++;
}
} while (index != -1);
The above code runs for only around 150 ms.
I am wondering what is the magic that makes string.IndexOf
runs so fast?
Anyone can inspire me?
Edit
Ok, according to @BrokenGlass's answer. I modified my code in the way that they don't check the pairing, instead, they check how many '<' in the string.
for (int j = 0; j < html.Length; j++)
{
char c = html[j];
if (c == '>')
{
match1++;
}
}
the above code runs for around 760 ms.
Using IndexOf
int index = -1;
do
{
index = html.IndexOf('<', index + 1);
if (index != -1)
{
match2++;
}
} while (index != -1);
The above code runs for about 132 ms. still very very fast.
Edit 2
After read @Jeffrey Sax comment, I realised that I was running in VS with Debug mode.
Then I built and ran in release mode, ok, IndexOf
is still faster, but not that faster any more.
Here is the results:
For the pairing count: 207ms VS 144ms
For the normal one char count: 141ms VS 111ms.
My own codes' performance was really improved.
Lesson learned: when you do the benchmark stuff, do it in release mode!
string.IndexOf
is doing behind the scenes? – Gametocyteint len =html.length;
and then uselen
in the for loop. – Polliwog<
>
pair in theIndexOf
version. – Grim