The "problem" with acos
is that it has singularities (branch points) at ±1, and for that reason Taylor or Padé won't approximate well. The same applies to MiniMax approximations that try to minimize the max-norm when the domain contains a singularity.
The singularities are basically square-roots, so they can be factored out and the remaining function is smooth and approximates well. The remaining function is asinQ
in the C99 code that follows, it's a MiniMax rational approximation computed with the Remez algorithm. (The code is GNU-C99 actually so M_PI
is available).
Notes on the code:
The approximation minimizes the relative error because it's about floating point.
It uses sqrt
(square root), fma
(float multiply-add), ldexp
(scale by a power of 2)
It uses if-else
and conditional assignment. I am not familiar with shaders and whether that ruins the performance. If that's the case, maybe the code can be reorganized. Each path eveluates asinQ
for some value 0 ≤ x ≤ 1/2.
#include <math.h>
#include <stdio.h>
static const double P[] =
{
-5254.7508920534192751630665,
+6236.4128689522053163220058,
-2597.2384260994230261835544,
+445.05537923135581032176275,
-27.342673770163272135013850,
+0.30325426599090297962612977
};
static const double Q[] =
{
-5254.7508920534192747096309,
+6674.3087766233233966852037,
-3054.9042449253514128522165,
+603.81083008747677276795370,
-47.647718147243950851054008
// 1.0
};
// Approximate arcsin(sqrt(x/2)) / sqrt(x/2) as MiniMax [5/5] over [0, 1/2].
static inline double asinQ (double x)
{
double p = fma (P[5], x, P[4]);
p = fma (p, x, P[3]);
p = fma (p, x, P[2]);
p = fma (p, x, P[1]);
p = fma (p, x, P[0]);
double q = Q[4] + x; // Q[5] = 1.0
q = fma (q, x, Q[3]);
q = fma (q, x, Q[2]);
q = fma (q, x, Q[1]);
q = fma (q, x, Q[0]);
return p / q;
}
double my_acos (double x)
{
double x_abs = x > 0.0 ? x : -x;
if (x_abs <= 0.5)
{
return M_PI/2 - x * asinQ (ldexp (x*x, 1));
}
else
{
double x1 = 1.0 - x_abs;
double ret = sqrt (ldexp (x1, 1)) * asinQ (x1);
return x > 0.0 ? ret : M_PI - ret;
}
}
int main (int argc, char *argv[])
{
double max_err = -1.0;
double min_bits = 0;
double max_x = 0;
int ex = 8;
if (argc >= 2)
sscanf (argv[1], "%i", &ex);
for (double x = -1.0; x <= 1.0; x += ldexp (1, -ex))
{
double err = (my_acos (x) - acos(x)) / acos (x);
double bits = - log2 (fabs (err));
//printf ("%+.6f: % 6e (%.2f bits)\n", x, err, bits);
if (fabs (err) > max_err)
max_err = fabs (err), min_bits = bits, max_x = x;
}
printf ("worst (%ld values): x=%+.6f, err=%6e, bits=%.2f\n",
1 + (2L << ex), max_x, max_err, min_bits);
return 0;
}
Running this for 16 Mio values prints
worst (16777217 values): x=+0.839781, err=5.803408e-16, bits=50.61
So the maximal relative error is around 6·10−16 which is a precision of around 50.5 bits. (The absolute error for that value is 3.4·10−16.) Assuming acos
and sqrt
introduce an error of 0.5 ULPs, the precision of my_acos
is around 1.5 ULPs.
<cmath>
– Edgardosqrt
s. – Pressmark