Problems in implementing Horner's method in Python
Asked Answered
O

3

18

So I have written down the codes for evaluating polynomial using three different methods. Horner's method should be the fastest, while the naive method should be the slowest, right? But how come the time for computing it is not what I expect? And the time for calculation sometimes turns out to be exactly the same for itera and naive method. What's wrong with it?

import numpy.random as npr
import time

def Horner(c,x):
    p=0
    for i in c[-1::-1]:
        p = p*x+i
    return p

def naive(c,x):
    n = len(c)
    p = 0
    for i in range(len(c)):
        p += c[i]*x**i 
    return p

def itera(c,x):
    p = 0
    xi = 1
    for i in range(len(c)):
        p += c[i]*xi
        xi *= x 
    return p

c=npr.uniform(size=(500,1))
x=-1.34

start_time=time.time()
print Horner(c,x)
print time.time()-start_time

start_time=time.time()
print itera(c,x)
print time.time()-start_time

start_time=time.time()
print naive(c,x)
print time.time()-start_time

here are some of the results:

[  2.58646959e+69]
0.00699996948242
[  2.58646959e+69]
0.00600004196167
[  2.58646959e+69]
0.00600004196167

[ -3.30717922e+69]
0.00899982452393
[ -3.30717922e+69]
0.00600004196167
[ -3.30717922e+69]
0.00600004196167

[ -2.83469309e+69]
0.00999999046326
[ -2.83469309e+69]
0.00999999046326
[ -2.83469309e+69]
0.0120000839233
Ouphe answered 31/1, 2015 at 10:5 Comment(2)
thank you for all the answers. Other than the implementation of the timeit function, I wonder are there any specific things I can do to improve each of the three algorithms so that they are more efficient/faster?Ouphe
Yes there are :) To make that more clear, you can also edit the original question.King
K
16

Your profiling can be much improved. Plus, we can make your code run 200-500x faster.


(1) Rinse and repeat

You can't run just one iteration of a performance test, for two reasons.

  1. Your time resolution might not be good enough. This is why you sometimes got the same time for two implementations: the time for one run was near the resolution of your timing mechanism, so you recorded only one "tick".
  2. There are all sorts of factors that affect performance. Your best bet for a meaningful comparison will be a lot of iterations.

You don't need gazillions of runs (though, of course, that doesn't hurt), but you estimate and adjust the number of iterations until the variance is within a level acceptable to your purpose.

timeit is a nice little module for profiling Python code.

I added this to bottom of your script.

import timeit

n = 1000

print 'Horner', timeit.timeit(
    number = n,
    setup='from __main__ import Horner, c, x',
    stmt='Horner(c,x)'
)
print 'naive', timeit.timeit(
    number = n,
    setup='from __main__ import naive, c, x',
    stmt='naive(c,x)', 
)
print 'itera', timeit.timeit(
    number = n,
    setup='from __main__ import itera, c, x',
    stmt='itera(c,x)', 
)

Which produces

Horner 1.8656351566314697
naive 2.2408010959625244
itera 1.9751169681549072

Horner is the fastest, but it's not exactly blowing the doors off the other two.


(2) Look at what is happening...very carefully

Python has operator overloading, so it's easy to miss seeing this.

npr.uniform(size=(500,1)) is giving you a 500 x 1 numpy structure of random numbers.

So what?

Well, c[i] isn't a number. It's a numpy array with one element. Numpy overloads the operators so you can do things like multiply an array by a scalar.

That's fine, but using an array for every element is a lot of overhead, so it's harder to see the difference between the algorithms.

Instead, let's try a simple Python list:

import random
c = [random.random() for _ in range(500)]

And now,

Horner 0.034661054611206055
naive 0.12771987915039062
itera 0.07331395149230957

Whoa! All the time times just got faster (by 10-60x). Proportionally, the Horner implementation got even faster than the other two. We removed the overhead on all three, and can now see the "bare bones" difference.

Horner is 4x faster than naive and 2x faster than itera.


(3) Alternate runtimes

You're using Python 2. I assume 2.7.

Let's see how Python 3.4 fares. (Syntax adjustment: you'll need to put parenthesis around the argument list to print.)

Horner 0.03298933599944576
naive 0.13706714100044337
itera 0.06771054599812487

About the same.

Let's try PyPy, a JIT implementation of Python. (The "normal" Python implementation is called CPython.)

Horner 0.006507158279418945
naive 0.07541298866271973
itera 0.005059003829956055

Nice! Each implementation is now running 2-5x faster. Horner is now 10x the speed of naive, but slightly slower than itera.

JIT runtimes are more difficult to profile than interpreters. Let's increase the number of iterations to 50000, and try it just to make sure.

Horner 0.12749004364013672
naive 3.2823100090026855
itera 0.06546688079833984

(Note that we have 50x the iterations, but only 20x the time...the JIT hadn't taken full effect for many of the first 1000 runs.) Same conclusions, but the differences are even more pronounced.

Granted, the idea of JIT is to profile, analyze, and rewrite the program at runtime, so if your goal is to compare algorithms, this is going to add a lot of non-obvious implementation detail.

Nonetheless, comparing runtimes can be useful in giving a broader perspective.


There are a few more things. For example, your naive implementation computes a variable it never uses. You use range instead of xrange. You could try iterating backwards with an index rather than a reverse slice. Etc.

None of these changed the results much for me, but they were worth considering.

King answered 31/1, 2015 at 11:53 Comment(2)
"Let's try PyPy, a JIT implementation of Python 2." → And Python 3 for a while now.Borzoi
@Veedrac, awesome! The last I had heard was pypy.org/py3donate.html, but that was like a year ago.King
J
6

You cannot obtain accurate result by measuring things like that:

start_time=time.time()
print Horner(c,x)
print time.time()-start_time

Presumably most of the time is spend in the IO function involved by the print function. In addition, to have something significant, you should perform the measure on a large number of iteration in order to smooth errors. In the general case, you might want to perform your test on various input data as well -- as depending your algorithm, some case might coincidentally be solved more efficiently than others.

You should definitively take a look at the timeit module. Something like that, maybe:

import timeit
print 'Horner',timeit.timeit(stmt='Horner(c,x)', 
                  setup='from __main__ import Horner, c, x',
                  number = 10000)
#                          ^^^^^
#                probably not enough. Increase that once you will
#                be confident

print 'naive',timeit.timeit(stmt='naive(c,x)', 
                  setup='from __main__ import naive, c, x',
                  number = 10000)
print 'itera',timeit.timeit(stmt='itera(c,x)', 
                  setup='from __main__ import itera, c, x',
                  number = 10000)

Producing this on my system:

Horner 23.3317809105
naive 28.305519104
itera 24.385917902

But still with variable results from on run to the other:

Horner 21.1151690483
naive 23.4374330044
itera 21.305426836

As I said before, to obtain more meaningful results, you should definitively increase the number of tests, and run that on several test case in order to smooth results.

Junkie answered 31/1, 2015 at 10:26 Comment(3)
This is an excellent answer. I had favorited this question because I wanted to pursue the issue of proper benchmarking, which it raises. Should a the wiki contain information on good benchmarking, or is there a master post on this topic?Orton
The absolute value of the results might be variable, but the relative results are rather consistent between your two runs.King
It is correct @Paul that the naive algorithm was consistently the worst. As between Horner and iterative, things are less obvious (at least on those few runs -- using the OP original implementation) -- in the first run Horner is something like 5% better than iterative. But in the second run there are at only 1% difference -- and given the variance between those two runs, it is hard to have a definitive opinion without further investigations I think.Junkie
T
3

If you are doing a lot of benchmarking, scientific computing, numpy related work and many more things using ipython will be an extremely useful tool.

To benchmark you can time the code with timeit using ipython magic where you will get more consistent results each run, it is simply a matter of using timeit then the function or code to time :

In [28]: timeit Horner(c,x)
1000 loops, best of 3: 670 µs per loop

In [29]: timeit naive(c,x)
1000 loops, best of 3: 983 µs per loop

In [30]: timeit itera(c,x)
1000 loops, best of 3: 804 µs per loop

To time code spanning more than one line you simply use %%timeit:

In [35]: %%timeit
   ....: for i in range(100):
   ....:     i ** i
   ....: 
10000 loops, best of 3: 110 µs per loop

ipython can compile cython code, f2py code and do numerous other very helpful tasks using different plugins and ipython magic commands.

builtin magic commands

Using cython and some very basic improvements we can improve the efficiency of Horner by about 25 percent:

In [166]: %%cython
import numpy as np
cimport numpy as np
cimport cython
ctypedef np.float_t DTYPE_t
def C_Horner(c, DTYPE_t x):
    cdef DTYPE_t p
    for i in reversed(c):
        p = p * x + i
    return p   

In [28]: c=npr.uniform(size=(2000,1))

In [29]: timeit Horner(c,-1.34)
100 loops, best of 3: 3.93 ms per loop
In [30]: timeit C_Horner(c,-1.34)
100 loops, best of 3: 2.21 ms per loop

In [31]: timeit itera(c,x)
100 loops, best of 3: 4.10 ms per loop
In [32]: timeit naive(c,x)
100 loops, best of 3: 4.95 ms per loop

Using the list in @Paul drapers answer our cythonised version runs twice as fast as the original function and much faster then ietra and naive:

In [214]: import random

In [215]: c = [random.random() for _ in range(500)]

In [44]: timeit C_Horner(c, -1.34)
10000 loops, best of 3: 18.9 µs per loop    
In [45]: timeit Horner(c, -1.34)
10000 loops, best of 3: 44.6 µs per loop
In [46]: timeit naive(c, -1.34)
10000 loops, best of 3: 167 µs per loop
In [47]: timeit itera(c,-1.34)
10000 loops, best of 3: 75.8 µs per loop
Thais answered 31/1, 2015 at 11:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.