Estimating actual (not theoretic) runtime complexity of an implementation
Asked Answered
I

1

9

Anyone in computer science will know that HeapSort is O(n log n) worst case in theory, while QuickSort is O(n^2) worst case. However, in practice, a well implemented QuickSort (with good heuristics) will outperform HeapSort on every single data set. On one hand, we barely observe the worst case, and on the other hand e.g. CPU cache lines, prefetching etc. make an enormous difference in many simple tasks. And while e.g. QuickSort can handle presorted data (with a good heuristic) in O(n), HeapSort will always reorganize the data in O(n log n), as it doesn't exploit the existing structure.

For my toy project caliper-analyze, I've recently been looking into methods for estimating the actual average complexity of algorithms from benchmarking results. In particular, I've tried Lawson and Hanson's NNLS fitting with different polynomials.

However, it doesn't work too well yet. Sometimes I get usable results, sometimes I don't. I figure that it may help to just do larger benchmarks, in particular try more parameters.

The following results are for sorting Double objects, in a SAW pattern with 10% randomness. n was only up to 500 for this run, so it is not very representative for actual use... The numbers are the estimated runtime dependency on the size. The output is hand-edited and manually sorted, so it does not reflect what the tool currently provides!

BubbleSortTextbook       LINEAR: 67.59  NLOG2N:  1.89  QUADRATIC: 2.51
BubbleSort               LINEAR: 54.84                 QUADRATIC: 1.68
BidirectionalBubbleSort  LINEAR: 52.20                 QUADRATIC: 1.36
InsertionSort            LINEAR: 17.13  NLOG2N:  2.97  QUADRATIC: 0.86
QuickSortTextbook                       NLOG2N: 18.15
QuickSortBo3             LINEAR: 59.74                 QUADRATIC: 0.12
Java                     LINEAR:  6.81  NLOG2N: 12.33
DualPivotQuickSortBo5                   NLOG2N: 11.28
QuickSortBo5             LINEAR:  3.35  NLOG2N:  9.67

You can tell that while in this particular setting (often it does not at all work satisfactory) the results largely agree with the known behavior: bubble sort is really costly, and a good heuristic on QuickSort is much better. However, e.g. QuickSort with median-of-three heuristics ends up with an O(n + n^2) estimation, for example, while the other QuickSorts are estimated as O(n + n log n)

So now to my actual questions:

  • Do you know algorithms/approaches/tools that perform runtime complexity analysis from benchmark data, in order to predict which implementation (as you can see above, I'm interested in comparing different implementations of the same algorithm!) performs best on real data?
  • Do you know scientific articles with respect to this (estimating average complexity of implementations)?
  • Do you know robust fitting methods that will help getting more accurate estimates here? E.g. a regularized version of NNLS.
  • Do you know rules-of-thumb of how many samples one needs to get a reasonable estimate? (in particular, when should the tool refrain from giving any estimate, as it will likely be inaccurate anyway?)

And let me emphasizes once more that I'm not interested in theoretical complexity or formal analysis. I'm interested in seeing how implementations (of theoretically even identical algorithms) perform on benchmark data on real CPUs... the numerical factors for a common range are of key interest to me, more than the asymptotic behaviour. (and no, on the long run it is not just time complexity and sorting. But I'm interested in index structures and other parameters. And caliper can e.g. also measure memory consumption, if I'm not mistaken) Plus, I'm working in java. An approach that just calls a Matlab builtin is of no use to me, as I'm not living in the matlab world.

If I have time, I will try to re-run some of these benchmarks with a larger variety of sizes, so I get more data points. Maybe it will then just work... but I believe there are more robust regression methods that I could use to get better estimates even from smaller data sets. Plus, I'd like to detect when the sample is just too small to do any prediction at all!

Indigenous answered 5/7, 2013 at 16:52 Comment(7)
@JimGarrison Not necessarily. OP keeps stressing he's less interested in the theoretical components...Sacrarium
More suited to Theoretical Computer Science or Cross ValidatedMonkeypot
True, but this question isn't about programming, it's about algorithm evaluation. Even though the approach is empirical, it's still more about statistics and [applied] theory.Monkeypot
It is not really applicable for TCS, because I'm assuming a black box algorithm model. And on CV, I have the impression that solutions will be not too practically useable. SO is much more about implementing these things in actual code. A solution that works in theory doesn't help me a lot. In theory, NNLS should do the job, but in practise it doesn't really work well enough; it maybe isn't robust enough wrt. outliers, and doesn't bare well with multiple measurements at the same n?Indigenous
Plus, statisticians often assume that I'm doing this just once. Then I can hand-pick parameters, tune my kernel etc.; but I'm trying to turn this into a built-in function of my caliper-analysis tool, that should run without me setting any parameters. :-(Indigenous
So, the question boils down to "please recommend some sophisticated benchmarking tools that do some statistical analysis"?Gabrielegabriell
@Gabrielegabriell I think he summarized the questions pretty well in the list at the end: How do you write a program that automatically estimates the scalability of an algorithm, given a set of benchmark results.Lacewing
A
2

If you want the actual complexity you are better of measuring it. Trying to guess how a program will perform without measuring is very unreliable.

The same program can perform very differently on a different machine. e.g. one algo might be faster on one machine but slower on another.

Your programs can be slower depending on what else the machine is doing so an algo which looks good but makes heavy use of resources like caches, can be slower and make other programs slower when it has to share those resources.

Testing an algo on a machine by itself can be up to 2-5x faster than trying to use it in a real program.

Do you know rules-of-thumb of how many samples one needs to get a reasonable estimate? (in particular, when should the tool refrain from giving any estimate, as it will likely be inaccurate anyway?)

For determining a percentile like 90% or 99% you need 1/(1-p)^2 i.e. for 99%tile you need at least 10,000 samples after warmup. For 99.9%tile you need one million.

Andreandrea answered 5/7, 2013 at 21:40 Comment(3)
But isn't he in fact trying to do this: automate this kind of testing? Estimating the complexity from benchmarking data sounds a lot like it, IMHO.Lacewing
Indeed. I'm already measuring runtime at different parameter settings, now I want to summarize this to the user in form of a function of the input parameters (e.g. data set size).Indigenous
@ErichSchubert You could start with n^p where p is an unknown power. You can do this by taking the log of the sample size n and a log of the time, and your estimate of the relationship of these logs is p This will work even if p is 1, 1.1, 1.5, 2 etc. Data modelling is a very complex subject and many PhDs do it for a living after years of experience. Yet they usually model theoretical systems. You want to model real system which means taking some compromises which are difficult to put in any proof.Andreandrea

© 2022 - 2024 — McMap. All rights reserved.