As per this answer, it's because the implementation of exponentiation has some overhead that multiplication does not. However, naive multiplication will get slower and slower as the exponent increases. An empirical demonstration:
In [3]: x = np.random.rand(1e6)
In [15]: %timeit x**2
100 loops, best of 3: 11.9 ms per loop
In [16]: %timeit x*x
100 loops, best of 3: 12.7 ms per loop
In [17]: %timeit x**3
10 loops, best of 3: 132 ms per loop
In [18]: %timeit x*x*x
10 loops, best of 3: 27.2 ms per loop
In [19]: %timeit x**4
10 loops, best of 3: 132 ms per loop
In [20]: %timeit x*x*x*x
10 loops, best of 3: 42.4 ms per loop
In [21]: %timeit x**10
10 loops, best of 3: 132 ms per loop
In [22]: %timeit x*x*x*x*x*x*x*x*x*x
10 loops, best of 3: 137 ms per loop
In [24]: %timeit x**15
10 loops, best of 3: 132 ms per loop
In [25]: %timeit x*x*x*x*x*x*x*x*x*x*x*x*x*x*x
1 loops, best of 3: 212 ms per loop
Note the exponentiation time stays more or less constant, except for the x**2
case which I suspect is special-cased, while multiplication gets slower and slower. It seems you could exploit this to get faster integer exponentiation... for example:
In [26]: %timeit x**16
10 loops, best of 3: 132 ms per loop
In [27]: %timeit x*x*x*x*x*x*x*x*x*x*x*x*x*x*x*x
1 loops, best of 3: 225 ms per loop
In [28]: def tosixteenth(x):
....: x2 = x*x
....: x4 = x2*x2
....: x8 = x4*x4
....: x16 = x8*x8
....: return x16
....:
In [29]: %timeit tosixteenth(x)
10 loops, best of 3: 49.5 ms per loop
It seems you could apply this technique generically by splitting any integer into a sum of the powers of two, computing each power of two as above, and summing:
In [93]: %paste
def smartintexp(x, exp):
result = np.ones(len(x))
curexp = np.array(x)
while True:
if exp%2 == 1:
result *= curexp
exp >>= 1
if not exp: break
curexp *= curexp
return result
## -- End pasted text --
In [94]: x
Out[94]:
array([ 0.0163407 , 0.57694587, 0.47336487, ..., 0.70255032,
0.62043303, 0.0796748 ])
In [99]: x**21
Out[99]:
array([ 3.01080670e-38, 9.63466181e-06, 1.51048544e-07, ...,
6.02873388e-04, 4.43193256e-05, 8.46721060e-24])
In [100]: smartintexp(x, 21)
Out[100]:
array([ 3.01080670e-38, 9.63466181e-06, 1.51048544e-07, ...,
6.02873388e-04, 4.43193256e-05, 8.46721060e-24])
In [101]: %timeit x**21
10 loops, best of 3: 132 ms per loop
In [102]: %timeit smartintexp(x, 21)
10 loops, best of 3: 70.7 ms per loop
It's fast for small even powers of two:
In [106]: %timeit x**32
10 loops, best of 3: 131 ms per loop
In [107]: %timeit smartintexp(x, 32)
10 loops, best of 3: 57.4 ms per loop
But gets slower as the exponent gets larger:
In [97]: %timeit x**63
10 loops, best of 3: 133 ms per loop
In [98]: %timeit smartintexp(x, 63)
10 loops, best of 3: 110 ms per loop
And not faster for large worst-cases:
In [115]: %timeit x**511
10 loops, best of 3: 135 ms per loop
In [114]: %timeit smartintexp(x, 511)
10 loops, best of 3: 192 ms per loop
x**y
is rewritten as2**(y*log x)
. Both2**a
andlog a
are single floating-point instructions on modern processors. – Embryology