Estimating actual (not theoretic) runtime complexity of an implementation

Anyone in computer science will know that HeapSort is O(n log n) worst case in theory, while QuickSort is O(n^2) worst case. However, in practice, a well implemented QuickSort (with good heuristics) will outperform HeapSort on every single data set. On one hand, we barely observe the worst case, and on the other hand e.g. CPU cache lines, prefetching etc. make an enormous difference in many simple tasks. And while e.g. QuickSort can handle presorted data (with a good heuristic) in O(n), HeapSort will always reorganize the data in O(n log n), as it doesn't exploit the existing structure.

For my toy project caliper-analyze, I've recently been looking into methods for estimating the actual average complexity of algorithms from benchmarking results. In particular, I've tried Lawson and Hanson's NNLS fitting with different polynomials.

However, it doesn't work too well yet. Sometimes I get usable results, sometimes I don't. I figure that it may help to just do larger benchmarks, in particular try more parameters.

The following results are for sorting Double objects, in a SAW pattern with 10% randomness. n was only up to 500 for this run, so it is not very representative for actual use... The numbers are the estimated runtime dependency on the size. The output is hand-edited and manually sorted, so it does not reflect what the tool currently provides!

BubbleSortTextbook       LINEAR: 67.59  NLOG2N:  1.89  QUADRATIC: 2.51
BubbleSort               LINEAR: 54.84                 QUADRATIC: 1.68
BidirectionalBubbleSort  LINEAR: 52.20                 QUADRATIC: 1.36
InsertionSort            LINEAR: 17.13  NLOG2N:  2.97  QUADRATIC: 0.86
QuickSortTextbook                       NLOG2N: 18.15
QuickSortBo3             LINEAR: 59.74                 QUADRATIC: 0.12
Java                     LINEAR:  6.81  NLOG2N: 12.33
DualPivotQuickSortBo5                   NLOG2N: 11.28
QuickSortBo5             LINEAR:  3.35  NLOG2N:  9.67

You can tell that while in this particular setting (often it does not at all work satisfactory) the results largely agree with the known behavior: bubble sort is really costly, and a good heuristic on QuickSort is much better. However, e.g. QuickSort with median-of-three heuristics ends up with an O(n + n^2) estimation, for example, while the other QuickSorts are estimated as O(n + n log n)

So now to my actual questions:

Do you know algorithms/approaches/tools that perform runtime complexity analysis from benchmark data, in order to predict which implementation (as you can see above, I'm interested in comparing different implementations of the same algorithm!) performs best on real data?
Do you know scientific articles with respect to this (estimating average complexity of implementations)?
Do you know robust fitting methods that will help getting more accurate estimates here? E.g. a regularized version of NNLS.
Do you know rules-of-thumb of how many samples one needs to get a reasonable estimate? (in particular, when should the tool refrain from giving any estimate, as it will likely be inaccurate anyway?)

And let me emphasizes once more that I'm not interested in theoretical complexity or formal analysis. I'm interested in seeing how implementations (of theoretically even identical algorithms) perform on benchmark data on real CPUs... the numerical factors for a common range are of key interest to me, more than the asymptotic behaviour. (and no, on the long run it is not just time complexity and sorting. But I'm interested in index structures and other parameters. And caliper can e.g. also measure memory consumption, if I'm not mistaken) Plus, I'm working in java. An approach that just calls a Matlab builtin is of no use to me, as I'm not living in the matlab world.

If I have time, I will try to re-run some of these benchmarks with a larger variety of sizes, so I get more data points. Maybe it will then just work... but I believe there are more robust regression methods that I could use to get better estimates even from smaller data sets. Plus, I'd like to detect when the sample is just too small to do any prediction at all!

If you want the actual complexity you are better of measuring it. Trying to guess how a program will perform without measuring is very unreliable.

The same program can perform very differently on a different machine. e.g. one algo might be faster on one machine but slower on another.

Your programs can be slower depending on what else the machine is doing so an algo which looks good but makes heavy use of resources like caches, can be slower and make other programs slower when it has to share those resources.

Testing an algo on a machine by itself can be up to 2-5x faster than trying to use it in a real program.

Do you know rules-of-thumb of how many samples one needs to get a reasonable estimate? (in particular, when should the tool refrain from giving any estimate, as it will likely be inaccurate anyway?)

For determining a percentile like 90% or 99% you need 1/(1-p)^2 i.e. for 99%tile you need at least 10,000 samples after warmup. For 99.9%tile you need one million.

Recommended topics

Hot tags