There's a few things about this question that I'd like to point out. First off, let's get a few things straight regarding performance and C# in general, because it's hard to explain stuff while there are still misconceptions.
Next, I'll apply everything I'll to the specific question here.
Performance in C# in general
Big-O notation
On university, you learn how O(n) is always better than O(n^2) and how O(n) is always better than O(n log n). However, the underlying assumption for this is that every operation will cost roughly the same amount of time.
Now, when I first started programming on an 1802 RISC processor in 1986, this was very much the case: a memory operation was 1 clock tick, and so was an add, subtract, etc. In other words, Big-O works fine there.
In a modern computer, it's more difficult:
- Data is cached at various levels (speeds range from 15 GB/s to 1000 GB/s);
- Operations vary between 0.5 clock tick and dozens of clock ticks;
- Data is usually fetched in bursts - so random access is much worse than sequential;
- Vectorization can process up to 8 integers of aligned data at once;
- Branch misprediction can throw everything off balance because you have to flush the lot.
I've observed that the difference in performance for different implementations of the same algorithm can be as much as a factor 1000 (!)
Big-O still holds merit though, but you should put things into perspective. For example, say that you have N=10000, then 2log N ~ 13 -- and if that means you can benefit from all these things, it might also mean that a 'stupid' O(n log n) algorithm might just outperform your average O(n) algorithm.
From this you should also deduce that an O(n^2) won't ever outperform an O(n) algorithm. So, Big-O still has its uses; you just have to put things into perspective.
Some characteristics about C#
A myth about C# is that it's approximately as fast as C++ (which is my golden standard for "as fast as it gets"). In the hands of a skilled developer it's not, simple as that. For simple sorting, C++ is approximately 2x as fast - but if you have more complicated scenario's where you can really benefit from the "low level stuff" the difference can become quite huge. I usually estimate that the difference in performance is a factor 10. However, writing proper, high performance C++ code is challenging (to use an understatement), so you might want to stick with C# and decide to take the performance hit for granted.
One thing of interest is that the C# compiler and JIT compile stuff pretty fast. In part, that's because they compile everything per-function (so, no inlining, etc). Also, C# doesn't normally vectorize stuff. Don't take my word for it, use ctrl-alt-d in Visual studio and check the assembler output for yourself.
If we look at the list above, we can roughly state that (1),(2) and (3) aren't influenced by the fact that we're using C#; (4) is definitely influenced and (5) depends.
As for (5), consider this simple example:
void set(int[] array, int index)
{
array[index] = 0;
}
Remember that in C# methods are compiled per-method. This means that the compiler cannot assume that index
won't be out-of-bounds. In other words: it has to add two checks - and one of these will have to load memory:
if (index < 0 || index >= array.Length)
{
throw new IndexOutOfRangeException();
}
Sorting items
The question by the OP is about maintaining a sorted list of size m
. Sorting is a well-known operation, which will cost O(log m)
per item you insert at best. Since you're processing n
'random' items, you will get a best possible speed of O(n log m)
.
A binary heap (based on an array) might get you that performance number, but I don't feel like writing down a heap right now, and think this alternative is about the same speed (if not faster) :)
Your question
Now that we've established what we're talking about, let's just get into the facts at hand. I'll just explain this every step of the way.
First, while doing performance stuff, I make it a habit to remove using System.Linq
so we know we're just working on the native data structures with the expected characteristics.
Let's start with a tree structure
Another simple solution would be to use a red-black tree. We have one at our disposal in .NET, called a SortedSet
. It uses references, arithmetics, etc -- which is basically all the nasty stuff that I've warned about in (1), (2) and (3). Now, there are errors in the implementation here (for duplicates), but the speed is pretty much what you would expect:
static void Main(string[] args)
{
Random rnd = new Random(12839);
SortedSet<int> list = new SortedSet<int>();
for (int i = 0; i < 5000; ++i)
{
list.Add(rnd.Next());
}
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 10000; ++i)
{
for (int j = 0; j < 5000; ++j)
{
list.Add(rnd.Next());
}
int n = 0;
list.RemoveWhere((a) => n++ < 5000);
}
Console.WriteLine(sw.ElapsedMilliseconds);
Console.ReadLine();
}
Like practically all algorithms here, this algorithm executes in O(n log m)
.
What I approximately expect from AVL trees: 86220 ms.
Naive implementation
Normally I wouldn't have bothered with the red-black trees. Still, since you put a lot of work in the AVL trees, I felt it was necessary to get this measurement.
When I'm doing performance optimizations of algorithms, I always start off with the easiest-to-implement algorithm that has approximately the right Big-O and always prefer the thing that has the easiest data structure (read: array). In this case, it's a list combined with a standard sort, which will give O(m log m)
for every sort, executed m/n
times, and O(n)
data operations. The result is O(n + n log m)
.
So, why go for the easiest implementation you might ask? The answer is simple: easy implementations are also easy to compile & optimize, because they usually don't have a lot of branches, don't use a lot of random memory access, etc.
The most naive implementation (that I've already suggested in my comment) is to simply put stuff in an array, sort it, and then remove the bottom half of it.
Basically this can be implemented like this in a minimum test case:
static void Main(string[] args)
{
Random rnd = new Random(12839);
List<int> list = new List<int>();
for (int i = 0; i < 5000; ++i)
{
list.Add(rnd.Next());
}
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 10000; ++i)
{
for (int j = 0; j < 5000; ++j)
{
list.Add(rnd.Next());
}
list.Sort((a, b) => a.CompareTo(b)); // #1
list.RemoveRange(0, 5000);
}
Console.WriteLine(sw.ElapsedMilliseconds);
Console.ReadLine();
}
Baseline performance: 10047 ms.
Optimization 1: remove method calls and branches
Method calls cost time. So do branches. So, if we don't need to branch, we might as well just eliminate that. In other words: this is about (5).
One thing that comes to mind is to replace #1 by:
list.Sort((a, b) => a - b);
In most (!) scenario's this gives the desired result, I bluntly assume this scenario is no exception.
Measurement: 8768 ms. (Yes people, that's a 15% change!)
For the fun of it, we also do a simple test for (2) with:
list.Sort((a, b) => (int)((float)a - (float)b));
It's exactly the same size of the operators (32 bits), it's exactly the same data and will give the same results -- we're just comparing everything with a different CPU operation and adding some casts. Measurement: 10902 ms. This is more than you would expect if every operation was just a single clock tick.
Optimization 2: Arrays or lists?
I could also care about the list itself; we're using quite a few calls to it, so we might substitute it for an array. We could even eliminate the RemoveRange
if we would invert the sort order. So why don't I focus on that instead? Well, actually I could, but I can tell you it won't make a lot of difference, because relatively speaking, it's not called that often. Still, no harm in testing, right?:
static void Main(string[] args)
{
Random rnd = new Random(12839);
int[] list = new int[10000];
for (int i = 0; i < 5000; ++i)
{
list[i] = rnd.Next();
}
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 10000; ++i)
{
for (int j = 0; j < 5000; ++j)
{
list[j + 5000] = rnd.Next();
}
Array.Sort(list, (a, b) => b - a);
}
Console.WriteLine(sw.ElapsedMilliseconds);
Console.ReadLine();
}
Now, there are two measurements here:
- Changing the list to an array just changed it to +/- 8700 ms - which doesn't make much of a difference
- Reversing the order changed the result to 7456 ms.
The reason this doesn't really make a difference, is that the underlying data structure for a List
is an array, so if we're sorting, we're just working on the same thing. And that's where our time is.
The thing to remember here is not that arrays are just as fast as a List
. Truth is: I've found that if they are they are actually faster in a lot of cases. However, in this case, we're not talking about optimizations in the inner loop, we're not overallocating too much memory (everything is kept in cache probably) and all memory access is aligned. All in all, we can therefore expect the difference to be quite small.
Optimization 3: remove more method calls
Now, you should notice that there's also an alternative here: method calls cost time, and the one here that's called the most here is the comparer. So let's go back to the solution with the List
and remove the compare operation. When we do that, we still have to copy. What do you expect?
Change the line to this:
list.Sort();
... and we have a new timing: 4123 ms.
Now, to be entirely fair, actually what we've done here is changing our inline delegate to Comparer<int>.Default
, which is the default implementation of the integer comparer. The delegate would have been wrapped in a comparer, creating 2 virtual calls - this is just 1 call. This means we could have reversed the order by implement our own comparer class as well, which would have been an even faster solution.
Optimization 4: Merge-join
Why sort everything if we only need to sort half of the data? That doesn't make sense, right?
Again, I pick the simplest possible algorithm to ackomplish the task. We walk through the list in sequential order, and store new items in sequential order, c.f. (1) and (3). No swapping, remember that we prefer sequential data access patterns. Then, we simply remove all the stuff we don't need anymore.
The algorithm we need is a merge-join, and works like this:
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 10000; ++i)
{
for (int j = 0; j < 5000; ++j)
{
list.Add(rnd.Next());
}
// Sort the second half:
list.Sort(5000, 5000, Comparer<int>.Default);
// Both the lower and upper half are sorted. Merge-join:
int lhs = 0;
int rhs = 5000;
while (list.Count < 15000)
{
int l = list[lhs];
int r = list[rhs];
if (l < r)
{
list.Add(l);
++lhs;
}
else if (l > r)
{
list.Add(r);
++rhs;
}
else
{
while (list.Count < 15000 && list[lhs] == l)
{
list.Add(l);
++lhs;
}
while (list.Count < 15000 && list[rhs] == r)
{
list.Add(r);
++rhs;
}
}
}
list.RemoveRange(0, 10000);
}
We have a new measurement, it's 3563 ms.
Optimization 5: RemoveRange #2
Remember, processing data in burst is very fast. The last piece of code is a perfect opportunity to show this in action. We use a RemoveRange
here, which processes data in burst. We can also use two buffers and swap them around. Basically we write in a second list2
during the merge-join and substitute the RemoveRange
with:
list.Clear();
var tmp = list;
list = list2;
list2 = tmp;
We now have a new timing: 3542 ms. Exactly the same!
From the last two you should conclude that doing burst operations costs so little time, that you usually shouldn't even bother.
Conclusion
I've started off with a tree that executed everything in 86220 ms and ended up with an algorithm that took 3542 ms. Quite blunt this means that the last implementation executes in 4% of the time of the first attempt.
Now, I did use different algorithms throughout this long answer - but the point was to show you how to do all these optimizations and what effect optimizations really have.