Automated computation of algorithm time complexity for terminating algorithms

Asked 14/11, 2012 at 0:3 Answered 4/1, 2013 at 22:17

Solved computer-science big-o code-analysis time-complexity halting-problem

There are a lot of related questions here on SO, but they all ask about writing a program to compute the complexity of arbitrary algorithms (which is obviously undecideable). I am willing to make the following restrictions on the input:

The algorithm terminates
The algorithm is purely functional

The question is, can a program be written to compute the time complexity of such an algorithm through static analysis? If the input algorithm does not terminate, the program behaviour is undefined (it may crash, return a lie, or fail to terminate).

Desirous answered 14/11, 2012 at 0:3 Comment(3)

Can you edit and actually ask a question that can be answered? I see a statement of requirements, but no actual question. – Hectare 14/11, 2012 at 0:41

@KenWhite the question wis in the title, but I've updated to put it more clearly in the body :) – Desirous 14/11, 2012 at 1:42

:-) I read a statement in the title (no question at all). It's much clearer now what you are actually asking. Thanks, and +1. – Hectare 14/11, 2012 at 1:45

I finally asked in the right place, and got the answer. No.

https://cstheory.stackexchange.com/questions/14969/algorithmically-compute-a-reasonable-bound-on-the-runtime-of-an-algorithm#comment40524_14969

Desirous answered 4/1, 2013 at 22:17 Comment(0)

You can't be 100% sure you're getting a correct answer from any technique to estimate the complexity based on real-world running time. This is because the exact running time can involve a really complex function, meaning the running time can theoretically follow any other function while the input size is below some really really big number. The running time only needs to tend towards (some multiple of) the complexity function as the input size tends towards infinity. This assumes you want to find a tight bound (which exists for many, but not all, algorithms) and not just an upper or lower bound.

But you can come up with some reasonable estimate of the complexity that should generally be quite accurate.

Also note that a number of algorithms have different running times for different inputs of the same size. You could try running the below for a few different inputs of the same size and averaging the result to mitigate this. This would also help mitigate system conditions that may affect the running time. Although you may not be able to estimate the complexity of the worst and best cases if you don't know the specific input to use for those (as they may be too rare for you to get them while passing in random data).

How to do this:

Record the time for some sufficiently large and sufficiently different sized inputs (e.g. you can run it for inputs of sizes equal to different powers of 10, like 100, 1000 and 10000, and these should be big enough for it to run for at least a few seconds to make the data less noisy). Let's use 3 input sizes. You strictly speaking only need 2 input sizes, but you can use 3 or more as additional verification.

Now we can try to map these 3 results we got to one of some set of complexities like O(1), O(log(n)), O(sqrt(n)), O(n), O(n log n), O(n²), O(n³), etc.

If you're trying to match it manually, you could put the running times you got on a graph along with the graphs of each of the above function (scaled appropriately) and see which one matches best.

If you're trying to automate it, you can try to map each of the functions to the input size and see how close it matches.

There are better ways to do this, but one really simple way to do it would be as follows:

Say you have these running times:

input size   running time
100          21 seconds
1000         29 seconds
10000        40 seconds

Now you can try to match one of those (say the biggest one, which is likely to be the most accurate) to a multiple of one of the above functions.

O(n):     k x n     = k x 10000     = 40,    k = 40 / 10000     = 0.004
O(log n): k x log n = k x log 10000 = 40,    k = 40 / log 10000 = 10
O(n²):    k x n²    = k x 10000²    = 40,    k = 40 / 10000²    = 0.0000004

Now compare what the equation gives you to what the actual running time is for the other input sizes:

For n = 1000, actual running time = 29 seconds
O(n):     0.004 x 1000      = 4 seconds
O(log n): 10 x log 1000     = 30 seconds
O(n²):    0.0000004 x 1000² = 0.4 seconds

For n = 100, actual running time = 21 seconds
O(n):     0.004 x 100      = 0.4 seconds
O(log n): 10 x log 100     = 20 seconds
O(n²):    0.0000004 x 100² = 0.004 seconds

Looking at this, we can clearly see that O(log n) is the closest, with the actual and predicted running times differing by only 1 second in both cases. So that would be our guess for the complexity.

Eamon answered 4/1, 2013 at 19:39 Comment(0)

Given the restriction that the algorithm stops it's possible. Execute the algorithm for each possible input and measure execution time. Next, choose a function as a possible upper boundary and test it for each of the results. If not good enough, increase the boundary and retest. Repeat till the boundary is good enough.

Edit: This solution assumes boundaries of a real computer program, i.e. the quantity of different inputs isn't infinite. Otherwise, it's not possible to compute the complexity of a general algorithm. Consider the algorithm for which the complexity is O(n) = n^O(n-1). Since the input is infinite, you won't be able to find any function f that can bound the complexity.

Rosenblum answered 14/11, 2012 at 5:49 Comment(10)

why isn't it satisfying? The question is purely theoretical, so is the answer. I suppose it can be optimized but don't see a reason for doing so – Rosenblum 14/11, 2012 at 6:51

Personally, I'd find value in a practical solution--can you imagine compiler warnings about inadvertent n**3 algorithms? But as a theoretical answer it still leaves something to be desired, in the heuristic approach to finding boundary functions. This setting doesn't offer much incentive for improving it, though, I'll grant. – Chatham 14/11, 2012 at 8:5

I would say that finding whether a general algorithm is bounded by O(n3) will require at least O(n3) computations, either at compile time or at runtime - I don't think either is acceptable in normal situation – Rosenblum 14/11, 2012 at 8:40

This solution is unlikely to terminate on most terminating inputs (because there are often an infinite number of possible inputs), and also is not very rigorous, being mostly an heursitic. – Desirous 14/11, 2012 at 13:32

@Desirous I've edited the answer to address the issue you raised. – Rosenblum 15/11, 2012 at 5:39

@icepack real computer programs have infinite different possible inputs all the time. For example, how would you compute the runtime of an algorithm that finds the length of a linked list? – Desirous 15/11, 2012 at 14:37

@Desirous No, it's not infinite. Your linked list size is bounded by the available resources, such as memory. Moreover, you're overlooking the fact that finding the length of a linked list is not a general algorithm, but a concrete one. You asked about finding runtime of a general algorithm, i.e. it should work for any algorithm. – Rosenblum 15/11, 2012 at 14:45

@icepack sure, the length of a list is bounded by resources, but then you're testing your hardware and not the algorithm. – Desirous 15/11, 2012 at 15:47

@Desirous Yes, that's why I'm differentiating practical and theoretical answers to this question. But more interesting part of the question is the one relating to the generality of algorithm to be tested. For a concrete and specific groups of algorithms it might be possible to create a somewhat useful automation, but not for the general case. – Rosenblum 15/11, 2012 at 15:53

Consider posting such questions on cs.stackexchange.com or cstheory.stackexchange.com. – Pacifa 25/11, 2012 at 6:37

Recommended topics

Hot tags