Fortran (designed for scientific computing) has a built-in power operator, and as far as I know Fortran compilers will commonly optimize raising to integer powers in a similar fashion to what you describe. C/C++ unfortunately don't have a power operator, only the library function pow()
. This doesn't prevent smart compilers from treating pow
specially and computing it in a faster way for special cases, but it seems they do it less commonly ...
Some years ago I was trying to make it more convenient to calculate integer powers in an optimal way, and came up with the following. It's C++, not C though, and still depends on the compiler being somewhat smart about how to optimize/inline things. Anyway, hope you might find it useful in practice:
template<unsigned N> struct power_impl;
template<unsigned N> struct power_impl {
template<typename T>
static T calc(const T &x) {
if (N%2 == 0)
return power_impl<N/2>::calc(x*x);
else if (N%3 == 0)
return power_impl<N/3>::calc(x*x*x);
return power_impl<N-1>::calc(x)*x;
}
};
template<> struct power_impl<0> {
template<typename T>
static T calc(const T &) { return 1; }
};
template<unsigned N, typename T>
inline T power(const T &x) {
return power_impl<N>::calc(x);
}
Clarification for the curious: this does not find the optimal way to compute powers, but since finding the optimal solution is an NP-complete problem and this is only worth doing for small powers anyway (as opposed to using pow
), there's no reason to fuss with the detail.
Then just use it as power<6>(a)
.
This makes it easy to type powers (no need to spell out 6 a
s with parens), and lets you have this kind of optimization without -ffast-math
in case you have something precision dependent such as compensated summation (an example where the order of operations is essential).
You can probably also forget that this is C++ and just use it in the C program (if it compiles with a C++ compiler).
Hope this can be useful.
EDIT:
This is what I get from my compiler:
For a*a*a*a*a*a
,
movapd %xmm1, %xmm0
mulsd %xmm1, %xmm0
mulsd %xmm1, %xmm0
mulsd %xmm1, %xmm0
mulsd %xmm1, %xmm0
mulsd %xmm1, %xmm0
For (a*a*a)*(a*a*a)
,
movapd %xmm1, %xmm0
mulsd %xmm1, %xmm0
mulsd %xmm1, %xmm0
mulsd %xmm0, %xmm0
For power<6>(a)
,
mulsd %xmm0, %xmm0
movapd %xmm0, %xmm1
mulsd %xmm0, %xmm1
mulsd %xmm0, %xmm1
gcc
does not optimize this. The 1970s FORTRAN compiler I used on CDC Cyber did this sort of transformation, even without selecting optimization. I think the Unix V6 (c. 1978)C
compiler did it when optimization was enabled, though many of the optimizations it did were to save code space, a precious commodity in those days. – Serviette(a*a)*(a*a)*(a*a)
into the mix, too. Same number of multiplications, but probably more accurate. – Commonplacea
has... – Primaveria(a*a*a)*(a*a*a)
, this is what I have suggested is as an alternative. What are you trying to say? – Commonplace