Your profiling can be much improved. Plus, we can make your code run 200-500x faster.
(1) Rinse and repeat
You can't run just one iteration of a performance test, for two reasons.
- Your time resolution might not be good enough. This is why you sometimes got the same time for two implementations: the time for one run was near the resolution of your timing mechanism, so you recorded only one "tick".
- There are all sorts of factors that affect performance. Your best bet for a meaningful comparison will be a lot of iterations.
You don't need gazillions of runs (though, of course, that doesn't hurt), but you estimate and adjust the number of iterations until the variance is within a level acceptable to your purpose.
timeit
is a nice little module for profiling Python code.
I added this to bottom of your script.
import timeit
n = 1000
print 'Horner', timeit.timeit(
number = n,
setup='from __main__ import Horner, c, x',
stmt='Horner(c,x)'
)
print 'naive', timeit.timeit(
number = n,
setup='from __main__ import naive, c, x',
stmt='naive(c,x)',
)
print 'itera', timeit.timeit(
number = n,
setup='from __main__ import itera, c, x',
stmt='itera(c,x)',
)
Which produces
Horner 1.8656351566314697
naive 2.2408010959625244
itera 1.9751169681549072
Horner is the fastest, but it's not exactly blowing the doors off the other two.
(2) Look at what is happening...very carefully
Python has operator overloading, so it's easy to miss seeing this.
npr.uniform(size=(500,1))
is giving you a 500 x 1 numpy structure of random numbers.
So what?
Well, c[i]
isn't a number. It's a numpy array with one element. Numpy overloads the operators so you can do things like multiply an array by a scalar.
That's fine, but using an array for every element is a lot of overhead, so it's harder to see the difference between the algorithms.
Instead, let's try a simple Python list:
import random
c = [random.random() for _ in range(500)]
And now,
Horner 0.034661054611206055
naive 0.12771987915039062
itera 0.07331395149230957
Whoa! All the time times just got faster (by 10-60x). Proportionally, the Horner implementation got even faster than the other two. We removed the overhead on all three, and can now see the "bare bones" difference.
Horner is 4x faster than naive and 2x faster than itera.
(3) Alternate runtimes
You're using Python 2. I assume 2.7.
Let's see how Python 3.4 fares. (Syntax adjustment: you'll need to put parenthesis around the argument list to print
.)
Horner 0.03298933599944576
naive 0.13706714100044337
itera 0.06771054599812487
About the same.
Let's try PyPy, a JIT implementation of Python. (The "normal" Python implementation is called CPython.)
Horner 0.006507158279418945
naive 0.07541298866271973
itera 0.005059003829956055
Nice! Each implementation is now running 2-5x faster. Horner is now 10x the speed of naive, but slightly slower than itera.
JIT runtimes are more difficult to profile than interpreters. Let's increase the number of iterations to 50000, and try it just to make sure.
Horner 0.12749004364013672
naive 3.2823100090026855
itera 0.06546688079833984
(Note that we have 50x the iterations, but only 20x the time...the JIT hadn't taken full effect for many of the first 1000 runs.) Same conclusions, but the differences are even more pronounced.
Granted, the idea of JIT is to profile, analyze, and rewrite the program at runtime, so if your goal is to compare algorithms, this is going to add a lot of non-obvious implementation detail.
Nonetheless, comparing runtimes can be useful in giving a broader perspective.
There are a few more things. For example, your naive implementation computes a variable it never uses. You use range
instead of xrange
. You could try iterating backwards with an index rather than a reverse slice. Etc.
None of these changed the results much for me, but they were worth considering.