For solving cubic equations with simple C code, I have found the QBC
solver by noted numerics expert professor William Kahan to be sufficiently robust, reasonably fast and reasonably accurate:
William Kahan, "To solve a real cubic equation." PAM-352, Center for Pure and Applied Mathematics, University of California, Berkely. November 10, 1986. (online, online)
This uses a derivative-based iterative method to find the real root, reduces to a quadratic equation based on that, finally uses a numerically robust quadratic equation solver to find the two remaining roots. Typically, the iterative solver requires about five to ten iterations to converge to the result. Both solvers can be enhanced for accuracy and performance by judicious use of fused multiply-add (FMA) operations, available in ISO C99 via the fma()
standard math function.
Crucial to the accuracy of the quadratic solver is the computation of the discriminant. For this, I use the following code based on recent research:
/* Compute B*B - A*C, accurately
Claude-Pierre Jeannerod, Nicolas Louvet, and Jean-Michel Muller,
"Further Analysis of Kahan's Algorithm for the Accurate Computation
of 2x2 Determinants". Mathematics of Computation, Vol. 82, No. 284,
Oct. 2013, pp. 2245-2264
https://www.ams.org/journals/mcom/2013-82-284/S0025-5718-2013-02679-8/S0025-5718-2013-02679-8.pdf
*/
double DISC (double A, double B, double C)
{
double w = C * A;
double e = fma (-C, A, w);
double f = fma (B, B, -w);
double r = f + e;
return r;
}
Using double-precision arithmetic, Kahan's solver cannot always produce result accurate to double precision. One of the test cases provided in Kahan's paper illustrates why this is the case:
658x³ - 190125x² + 18311811x - 587898164
Using an arbitrary precision math library, we find that the roots of this cubic equation are as follows:
96.229639346592182_18...
96.357064825184152_07... ± i * 0.069749752043689625_43...
QBC
using double-precision arithmetic computes the roots as
96.2296393 50445893
96.35706482 3257289 ± i * 0.0697497 48521837268
The reason for this is that the function evaluation around the real root suffers from errors as large as 60% in the computed function value, preventing the iterative solver from getting closer to the root. By changing the function and derivative evaluation to use double-double computation for intermediate computation (at hefty computational cost), we can address that issue.
/* Data type for double-double computation */
typedef struct {
double l; // low / tail
double h; // high / head
} dbldbl;
dbldbl make_dbldbl (double head, double tail);
double get_dbldbl_head (dbldbl a);
double get_dbldbl_tail (dbldbl a);
dbldbl add_dbldbl (dbldbl a, dbldbl b);
dbldbl mul_dbldbl (dbldbl a, dbldbl b);
void EVAL (double X, double A, double B, double C, double D,
double * restrict Q, double * restrict Qprime,
double * restrict B1, double * restrict C2)
{
#if USE_DBLDBL_EVAL
dbldbl AA, BB, CC, DD, XX, AX, TT, UU;
AA = make_dbldbl (A, 0);
BB = make_dbldbl (B, 0);
CC = make_dbldbl (C, 0);
DD = make_dbldbl (D, 0);
XX = make_dbldbl (X, 0);
AX = mul_dbldbl (AA, XX);
TT = add_dbldbl (AX, BB);
*B1 = get_dbldbl_head (TT) + get_dbldbl_tail(TT);
UU = add_dbldbl (mul_dbldbl (TT, XX), CC);
*C2 = get_dbldbl_head (UU) + get_dbldbl_tail(UU);
TT = add_dbldbl (mul_dbldbl (add_dbldbl (AX, TT), XX), UU);
*Qprime = get_dbldbl_head (TT) + get_dbldbl_tail(TT);
UU = add_dbldbl (mul_dbldbl (UU, XX), DD);
*Q = get_dbldbl_head (UU) + get_dbldbl_tail(UU);
#else // USE_DBLDBL_EVAL
*B1 = fma (A, X, B);
*C2 = fma (*B1, X, C);
*Qprime = fma (fma (A, X, *B1), X, *C2);
*Q = fma (*C2, X, D);
#endif // USE_DBLDBL_EVAL
}
/* Construct new dbldbl number. |tail| must be <= 0.5 ulp of |head| */
dbldbl make_dbldbl (double head, double tail)
{
dbldbl z;
z.l = tail;
z.h = head;
return z;
}
/* Return the head of a double-double number */
double get_dbldbl_head (dbldbl a)
{
return a.h;
}
/* Return the tail of a double-double number */
double get_dbldbl_tail (dbldbl a)
{
return a.l;
}
/* Add two dbldbl numbers */
dbldbl add_dbldbl (dbldbl a, dbldbl b)
{
dbldbl z;
double e, q, r, s, t, u;
/* Andrew Thall, "Extended-Precision Floating-Point Numbers for GPU
Computation." 2006. http://andrewthall.org/papers/df64_qf128.pdf
*/
q = a.h + b.h;
r = q - a.h;
t = (a.h + (r - q)) + (b.h - r);
s = a.l + b.l;
r = s - a.l;
u = (a.l + (r - s)) + (b.l - r);
t = t + s;
s = q + t;
t = (q - s) + t;
t = t + u;
z.h = e = s + t;
z.l = (s - e) + t;
/* For result of zero or infinity, ensure that tail equals head */
if (isinf (s)) {
z.h = s;
z.l = s;
}
if (z.h == 0) {
z.l = z.h;
}
return z;
}
/* Multiply two dbldbl numbers */
dbldbl mul_dbldbl (dbldbl a, dbldbl b)
{
dbldbl z;
double e, s, t;
s = a.h * b.h;
t = fma (a.h, b.h, -s);
t = fma (a.l, b.l, t);
t = fma (a.h, b.l, t);
t = fma (a.l, b.h, t);
z.h = e = s + t;
z.l = (s - e) + t;
/* For result of zero or infinity, ensure that tail equals head */
if (isinf (s)) {
z.h = s;
z.l = s;
}
if (z.h == 0) {
z.l = z.h;
}
return z;
}
The roots computed with the more accurate function and derivative evaluation are:
96.22963934659218 0
96.35706482518415 3 ± i * 0.06974975204 5672006
While the real parts are now accurate to within the limits of double precision, the imaginary parts are still off. The reason for this is is that in this case the quadratic equation is sensitive to minute differences in the coefficients. A one ulp error in either of them can cause differences of around 10-11 in the imaginary part. This could be worked around by representing the coefficients to higher than double precision and using higher-precision computation in the quadratic solver.