Speed of calculating powers (in python)
Asked Answered
B

6

44

I'm curious as to why it's so much faster to multiply than to take powers in python (though from what I've read this may well be true in many other languages too). For example it's much faster to do

x*x

than

x**2

I suppose the ** operator is more general and can also deal with fractional powers. But if that's why it's so much slower, why doesn't it perform a check for an int exponent and then just do the multiplication?

Edit: Here's some example code I tried...

def pow1(r, n):
  for i in range(r):
    p = i**n

def pow2(r, n):
  for i in range(r):
    p = 1
    for j in range(n):
      p *= i

Now, pow2 is just a quick example and is clearly not optimised!
But even so I find that using n = 2 and r = 1,000,000, then pow1 takes ~ 2500ms and pow2 takes ~ 1700ms.
I admit that for large values of n, then pow1 does get much quicker than pow2. But that's not too surprising.

Bittner answered 19/6, 2009 at 19:46 Comment(5)
What happens if you change your loop to go by 1/1000s instead of by 1s?Eadmund
Yes, in that case ** is faster than my crazy loop! Ok I guess I'll just have to deal with the fact that for small exponents it's faster to multiply and not use **.Bittner
I've blogged about this (numericalrecipes.wordpress.com/2009/06/05/binary-exponentiation), but a search of exponentiation by squaring (en.wikipedia.org/wiki/Exponentiation_by_squaring) may also give you a better idea of how powers with integer exponents are calculated efficiently, and why they may be marginally slower for small exponents (<4).Subtropical
I've done some small experiments with python and numpy: If you have lots of numbers and you want to calculate their powers, put those numbers into an array and use numpy. Otherwise, if you want to calculate power of a number in a for loop, using the ** python operator is way faster that numpy.power.Goop
@Bittner According to your timing, in the most extreme case, finding a power of 2, 1700 vs 2500 isn't even a factor of 2 difference. And that is the best case. Indeed, the only case where it matters to hand-optimize. Because as soon as you get beyond that, you have to do more than one multiplication, so you are now slower. Moral: If you know you are doing a square, than do the multiply by hand. But +1 for asking the question I had, and for doing the timing tests so I don't need to bother doing so :)Cocoon
S
27

Basically naive multiplication is O(n) with a very low constant factor. Taking the power is O(log n) with a higher constant factor (There are special cases that need to be tested... fractional exponents, negative exponents, etc) . Edit: just to be clear, that's O(n) where n is the exponent.

Of course the naive approach will be faster for small n, you're only really implementing a small subset of exponential math so your constant factor is negligible.

Suborbital answered 19/6, 2009 at 20:33 Comment(0)
E
7

Adding a check is an expense, too. Do you always want that check there? A compiled language could make the check for a constant exponent to see if it's a relatively small integer because there's no run-time cost, just a compile-time cost. An interpreted language might not make that check.

It's up to the particular implementation unless that kind of detail is specified by the language.

Python doesn't know what distribution of exponents you're going to feed it. If it's going to be 99% non-integer values, do you want the code to check for an integer every time, making runtime even slower?

Eadmund answered 19/6, 2009 at 20:7 Comment(7)
A compiler could not make that check, since it depends on the run-time value of the exponent.Kurdistan
I clarified my answer regarding constant exponents. Thanks David.Eadmund
I'd agree with you if there were no significant speed difference between the two, but x*x is a few times faster than x**2 for all values of x I've tried. Surely adding a tiny check for an int exponent wouldn't cost much, and would bring some great performance improvements in so many cases!Bittner
@Christwo. I would never use x**2 when I could use x*x, so it would always harm me. The check can be done by you manually at no cost to me.Eadmund
Between 0 and 10, how many integers are there? How many floats? Random floats will rarely be integers. So it's really down to your particular use for exponentiation. Without evidence that people are feeding some relatively large percentage of ints into the exponent, it's just extra code and extra time.Eadmund
@Eadmund You're right, I don't have any evidence that people plug in integers more often than ints, but I'm pretty sure the input isn't equally distributed over the floats! Anyway I guess I'm just a bit disturbed somehow that **n (for small n) should almost never be used as an operation.Bittner
Why are you disturbed? How often do you do exponentiation by a small integer? In my 30 years of programming, I don't think I've done it many times. If you know you're doing it, use a multiplication. If you don't know you're doing it, your code is probably assuming it's a float anyway.Eadmund
E
4

Doing this in the exponent check will slow down the cases where it isn't a simple power of two very slightly, so isn't necessarily a win. However, in cases where the exponent is known in advance( eg. literal 2 is used), the bytecode generated could be optimised with a simple peephole optimisation. Presumably this simply hasn't been considered worth doing (it's a fairly specific case).

Here's a quick proof of concept that does such an optimisation (usable as a decorator). Note: you'll need the byteplay module to run it.

import byteplay, timeit

def optimise(func):
    c = byteplay.Code.from_code(func.func_code)
    prev=None
    for i, (op, arg) in enumerate(c.code):
        if op == byteplay.BINARY_POWER:
            if c.code[i-1] == (byteplay.LOAD_CONST, 2):
                c.code[i-1] = (byteplay.DUP_TOP, None)
                c.code[i] = (byteplay.BINARY_MULTIPLY, None)
    func.func_code = c.to_code()
    return func

def square(x):
    return x**2

print "Unoptimised :", timeit.Timer('square(10)','from __main__ import square').timeit(10000000)
square = optimise(square)
print "Optimised   :", timeit.Timer('square(10)','from __main__ import square').timeit(10000000)

Which gives the timings:

Unoptimised : 6.42024898529
Optimised   : 4.52667593956

[Edit] Actually, thinking about it a bit more, there's a very good reason why this optimisaton isn't done. There's no guarantee that someone won't create a user defined class that overrides the __mul__ and __pow__ methods and do something different for each. The only way to do it safely is if you can guarantee that the object on the top of the stack is one that has the same result "x**2" and "x*x", but working that out is much harder. Eg. in my example it's impossible, as any object could be passed to the square function.

Exploit answered 19/6, 2009 at 22:5 Comment(2)
Looks fascinating. I'll have to read up on byteplay, oh and peephole optimisation! So I suppose it requires knowing about how Python byte code works.Bittner
IMHO, A user defined class that didn't preserve the mathematical relation between mul and pow would be a travesty. Especially for small integer powers, such as pow = 2. Such a class deserves whatever happens to it.Cocoon
P
3

An implementation of b^p with binary exponentiation

def power(b, p):
    """
    Calculates b^p
    Complexity O(log p)
    b -> double
    p -> integer
    res -> double
    """
    res = 1

    while p:
        if p & 0x1: res *= b
        b *= b
        p >>= 1

    return res
Pernik answered 25/7, 2014 at 8:25 Comment(5)
I tried this and it took 3x the time of b**p when b and p were decently large integers. Actual code I'm optimizing was: s = sum((-1)**(k-j)*scipy.special.comb(k, j, exact=True)*j**n for j in range(k+1))) with n ~ 500 and k being between 5120 and 8192 for the test.Kendrick
Note that even for x**2 this means that at least one multiplication is performed, which is what you perform for x * x too. But you have that overhead. So x * x is always faster than x ** 2.Protamine
@Eponymous: Binary exponentiation is the fastest way for integers. And if you really do (-1)**x naively, you waste a lot of time. The only answers can be 1 or -1, depending on the sign of the exponent. No need for exponentiation at all. I would simplify such expressions, as looks as if the next value is dependent on the former. You can probably save a lot of time by optimizing the algorithm.Protamine
@RudyVelthuis I know that the (-1)**x looks like no optimization was done. I'm no longer working on the above problem. But I remember when I was, removing (-1)**(k-j) (not rewriting the expression to sum the positive and negative terms separately, removing the term) did not change the speed noticeably (it's already special-cased in the ** operator). My comment above is that implementing j**n in Python was 3x slower than the built-in operator for numbers in the range I tried. I don't know why. Maybe the built-in code uses binary exponentiation too but does not have interpreter overhead.Kendrick
@Eponymous: I guess that the built-in is written in a language like C and uses binary exponentiation.Protamine
K
1

I'd suspect that nobody was expecting this to be all that important. Typically, if you want to do serious calculations, you do them in Fortran or C or C++ or something like that (and perhaps call them from Python).

Treating everything as exp(n * log(x)) works well in cases where n isn't integral or is pretty large, but is relatively inefficient for small integers. Checking to see if n is a small enough integer does take time, and adds complication.

Whether the check is worth it depends on the expected exponents, how important it is to get best performance here, and the cost of the extra complexity. Apparently, Guido and the rest of the Python gang decided the check wasn't worth doing.

If you like, you could write your own repeated-multiplication function.

Kurdistan answered 19/6, 2009 at 20:20 Comment(1)
Actually, for the case of x**2 with x being integral or not, nothing, i.e. no algorithm whatsoever, beats the simple x*x.Protamine
T
0

how about xxxxx? is it still faster than x**5?

as int exponents gets larger, taking powers might be faster than multiplication. but the number where actual crossover occurs depends on various conditions, so in my opinion, that's why the optimization was not done(or couldn't be done) in language/library level. But users can still optimize for some special cases :)

Tracietracing answered 19/6, 2009 at 20:6 Comment(3)
Actually, x2 = x * x; x3 = x2 * x; x5 = x2 * x3; (using C syntax rather than Python) uses one less multiplication.Kurdistan
just a note. if speed really matters, developers could build up a pow-table in startup or any math-functions for that matter... that's what game developers do..Tracietracing
@David, however even that hand unrolled multiplication is actually slower than calling ** 5 (probably mostly because it's also paying the python bytecode overhead rather than doing all the work in C). Compare "timeit.Timer('x**5','x=10').timeit()" with "timeit.Timer('a=x*x;a*=a;a*=x','x=10').timeit()". I get ~18% slower for the second one.Exploit

© 2022 - 2024 — McMap. All rights reserved.