Accurate computation of principal branch of the Lambert W function with standard C math library

From the literature it is clear that the most common method of computing the Lambert W function over the real numbers is via functional iteration. The second-order Newton method is the simplest of these schemes:

w_i+1 = w_i - (w_i exp(w_i) - x) / (exp (w_i) + w_iexp(w_i)).

Much of the literature prefers higher order methods, such as those by Halley, Fritsch, and Schroeder. During exploratory work I found that when performed in finite-precision floating-point arithmetic, the numerical properties of these higher-order iterations are not as favorable as the order may suggests. As a particular example, three Newton iterations consistently outperformed two Halley iterations in terms of accuracy. For this reason I settled on Newton iteration as the main building block, and used custom implementations of exp() that only need to delivers results that are representable as positive normals in IEEE-754 terminology to gain back some performance.

I further found that the Newton iteration will converge only slowly for large operands, in particular when the starting approximation is not very accurate. Since high iteration count is not conducive to good performance, I looked around for an alternative and found a superior candidate in the logarithm-based iteration scheme by Iacono and Boyd^[*], which also has second order convergence:

w_i+1 = (w_i / (1 + w_i)) * (1 + log (x / w_i)

Many implementations of the Lambert W function appear to be using different starting approximations for different portions of the input domain. My preference was for a single starting approximation across the entire input domain with a view to future vectorized implementations.

Luckily Iacono and Boyd also provide a universal starting approximation that works across the entire input domain of W₀, which, while not entirely living up to its promises, performs very well. I fine-tuned this for the single-precision implementation which deals with a much narrower input domain, using an optimizing search heuristic to achieve the best possible accuracy. I also employed custom implementations of log() that only have to deal with inputs that are positive normals.

Some care must be taken in both starting approximation and the Newton iteration to avoid overflow and underflow in intermediate computation. This is easily and cheaply accomplished by scaling by suitable powers of two.

While the resulting iteration schemes deliver accurate results in general, errors of many ulps occur for argument near zero and for arguments near -1/e ≈ -0.367879. I addressed the former issue by using the first few terms of the Taylor series expansion around zero: x - x² + (3/2)x³. The fact that W₀ ≈ √(1+ex)-1 on [-1/e, 0] suggests the use of a minimax polynomial approximation p(t) with t=√(x+1/e) which turns out to work reasonably well near -1/e. I generated this approximation with the Remez algorithm.

The accuracy achieved for both IEEE-754 binary32 mapped to float, and IEEE-754 binary64 mapped to double is well within the specified error bound. Maximum error in the positive half-plane is less than 1.5 ulps, and maximum error in the negative half-plane is below 2.7 ulps. The single-precision code was tested exhaustively, while the double-precision code was tested with one billion random arguments.

[*] Roberto Iacono and John P. Boyd. "New approximations to the principal real-valued branch of the Lambert W-function." Advances in Computational Mathematics, Vol. 43, No. 6 , December 2017, pp. 1403-1436.

The single-precision implementation of the Lambert W₀ function is as follows:

float expf_scale_pos_normal (float a, int scale);
float logf_pos_normal (float a);

/*
  Compute the principal branch of the Lambert W function, W_0. The maximum
  error in the positive half-plane is 1.49874 ulps and the maximum error in 
  the negative half-plane is 2.56002 ulps
*/
float lambert_w0f (float z) 
{
    const float em1_fact_0 = 0.625529587f; // exp(-1)_factor_0
    const float em1_fact_1 = 0.588108778f; // exp(-1)_factor_1
    const float qe1 = 2.71828183f / 4.0f; //  0x1.5bf0a8p-01 // exp(1)/4
    float e, w, num, den, rden, redz, y, r;

    if (isnan (z) || (z == INFINITY) || (z == 0.0f)) return z + z;
    if (fabsf (z) < 1.220703125e-4f) return fmaf (-z, z, z); // 0x1.0p-13
    redz = fmaf (em1_fact_0, em1_fact_1, z); // z + exp(-1)
    if (redz < 0.0625f) { // expansion at -(exp(-1))
        r = sqrtf (redz);
        w =             -1.23046875f;  // -0x1.3b0000p+0
        w = fmaf (w, r,  2.17185670f); //  0x1.15ff66p+1
        w = fmaf (w, r, -2.19554094f); // -0x1.19077cp+1
        w = fmaf (w, r,  1.92107077f); //  0x1.ebcb4cp+0
        w = fmaf (w, r, -1.81141856f); // -0x1.cfb920p+0
        w = fmaf (w, r,  2.33162979f); //  0x1.2a72d8p+1
        w = fmaf (w, r, -1.00000000f); // -0x1.000000p+0
    } else {
        /* Compute initial approximation. Based on: Roberto Iacono and John 
           Philip Boyd, "New approximations to the principal real-valued branch
           of the Lambert W function", Advances in Computational Mathematics, 
           Vol. 43, No. 6, December 2017, pp. 1403-1436
        */
        y = fmaf (2.0f, sqrtf (fmaf (qe1, z, 0.25f)), 1.0f);
        y = logf_pos_normal (fmaf (1.15262585f, y, -0.15262585f) / 
                             fmaf (0.45906518f, logf_pos_normal (y), 1.0f));
        w = fmaf (2.0390625f, y, -1.0f);

        /* perform Newton iterations to refine approximation to full accuracy */
        for (int i = 0; i < 3; i++) {
            e = expf_scale_pos_normal (w, -3); // 0.125f * expf (w);
            num = fmaf (w, e, -0.125f * z);
            den = fmaf (w, e, e);
            rden = 1.0f / den;
            w = fmaf (-num, rden, w);
        }
    }
    return w;
}

float uint32_as_float (uint32_t a)
{
    float r;
    memcpy (&r, &a, sizeof r);
    return r;
}

uint32_t float_as_uint32 (float a)
{
    uint32_t r;
    memcpy (&r, &a, sizeof r);
    return r;
}

/* exp(a) * 2**scale; positive normal results only! Maximum error 0.86565 ulp */
float expf_scale_pos_normal (float a, int scale)
{
    const float flt_int_cvt = 12582912.0f; // 0x1.8p23 
    float f, r, j, t;
    uint32_t i;

    /* exp(a) = 2**i * exp(f); i = rintf (a / log(2)) */
    j = fmaf (1.442695f, a, flt_int_cvt); // // 0x1.715476p0 // log2(e)
    t = j - flt_int_cvt; 
    f = fmaf (t, -6.93145752e-1f, a); // -0x1.62e400p-1  // log_2_hi 
    f = fmaf (t, -1.42860677e-6f, f); // -0x1.7f7d1cp-20 // log_2_lo 
    i = float_as_uint32 (j);

    /* approximate r = exp(f) on interval [-log(2)/2, +log(2)/2] */
    r =             1.37805939e-3f;  // 0x1.694000p-10
    r = fmaf (r, f, 8.37312452e-3f); // 0x1.125edcp-7
    r = fmaf (r, f, 4.16695364e-2f); // 0x1.555b5ap-5
    r = fmaf (r, f, 1.66664720e-1f); // 0x1.555450p-3
    r = fmaf (r, f, 4.99999851e-1f); // 0x1.fffff6p-2
    r = fmaf (r, f, 1.00000000e+0f); // 0x1.000000p+0
    r = fmaf (r, f, 1.00000000e+0f); // 0x1.000000p+0

    /* exp(a) = 2**(i+scale) * r; */
    r = uint32_as_float (((i + scale) << 23) + float_as_uint32 (r));
    return r;
}

/* compute natural logarithm of positive normals; maximum error: 0.85089 ulp */
float logf_pos_normal (float a)
{
    const float ln2 = 0.693147182f; // 0x1.62e430p-1 // log(2)
    const float two_to_m23 = 1.19209290e-7f; // 0x1.0p-23
    float m, r, s, t, i, f;
    int32_t e;

    /* log(a) = log(m * 2**i) = i * log(2) + log(m) */
    e = (float_as_uint32 (a) - float_as_uint32 (0.666666667f)) & 0xff800000;
    m = uint32_as_float (float_as_uint32 (a) - e);
    i = (float)e * two_to_m23;

    /* log(m) = log1p(f) */
    f = m - 1.0f;
    s = f * f;

    /* compute log1p(f) for f in [-1/3, 1/3] */
    r =             -0.130310059f;  // -0x1.0ae000p-3
    t =              0.140869141f;  //  0x1.208000p-3
    r = fmaf (r, s, -0.121483363f); // -0x1.f1988ap-4
    t = fmaf (t, s,  0.139814854f); //  0x1.1e5740p-3
    r = fmaf (r, s, -0.166846141f); // -0x1.55b36ep-3
    t = fmaf (t, s,  0.200120345f); //  0x1.99d8b2p-3
    r = fmaf (r, s, -0.249996200f); // -0x1.fffe02p-3
    r = fmaf (t, f, r);
    r = fmaf (r, f,  0.333331972f); //  0x1.5554fap-2
    r = fmaf (r, f, -0.500000000f); // -0x1.000000p-1
    r = fmaf (r, s, f);

    /* log(a) = i * log(2) + log(m) */
    r = fmaf (i, ln2, r);
    return r;
}

The double-precision implementation is structurally equivalent to the single-precision implementation, except that it makes use of the Iacono-Boyd iteration scheme:

double exp_scale_pos_normal (double a, int scale);
double log_pos_normal (double a);

/* Compute the principal branch of the Lambert W function, W_0. Maximum error:
   positive half-plane: 1.49210 ulp
   negative half-plane: 2.67824 ulp
*/
double lambert_w0 (double z) 
{
    const double em1_fact_0 = 0.57086272525975246; // 0x1.24481e7efdfcep-1 // exp(-1)_factor_0
    const double em1_fact_1 = 0.64442715366299452; // 0x1.49f25b1b461b7p-1 // exp(-1)_factor_1
    const double qe1 = 2.7182818284590452 * 0.25;  // 0x1.5bf0a8b145769p-1 // exp(1)/4
    double e, r, t, w, y, num, den, rden, redz;
    int i;
    
    if (isnan (z) || (z == INFINITY) || (z == 0.0)) return z + z;
    if (fabs (z) < 1.9073486328125e-6) return fma (fma (1.5, z, -1.) * z, z, z);
    redz = fma (em1_fact_0, em1_fact_1, z); // z + exp(-1)
    if (redz < 0.01025390625) { // expansion at -(exp(-1))
        r = sqrt (redz);
        w =            -7.8466654751155138;  // -0x1.f62fc463917ffp+2
        w = fma (w, r, 10.0241581340373877); //  0x1.40c5e74773ef5p+3
        w = fma (w, r, -8.1029379749359691); // -0x1.034b44947bba0p+3
        w = fma (w, r,  5.8322883145113726); //  0x1.75443634ead5fp+2
        w = fma (w, r, -4.1738796362609882); // -0x1.0b20d80dcb9acp+2
        w = fma (w, r,  3.0668053943936471); //  0x1.888d14440efd0p+1
        w = fma (w, r, -2.3535499689514934); // -0x1.2d41201913016p+1
        w = fma (w, r,  1.9366310979331112); //  0x1.efc70e3e0a0eap+0
        w = fma (w, r, -1.8121878855270763); // -0x1.cfeb8b968bd2cp+0
        w = fma (w, r,  2.3316439815968506); //  0x1.2a734f5b6fd56p+1
        w = fma (w, r, -1.0000000000000000); // -0x1.0000000000000p+0
        return w;
    }
    /* Roberto Iacono and John Philip Boyd, "New approximations to the 
       principal real-valued branch of the Lambert W function", Advances 
       in Computational Mathematics, Vol. 43, No. 6, December 2017, 
       pp. 1403-1436
     */
    y = fma (2.0, sqrt (fma (qe1, z, 0.25)), 1.0);
    y = log_pos_normal (fma (1.14956131, y, -0.14956131) / 
                        fma (0.4549574, log_pos_normal (y), 1.0));
    w = fma (2.036, y, -1.0);

    /* Use iteration scheme w = (w / (1 + w)) * (1 + log (z / w) from
       Roberto Iacono and John Philip Boyd, "New approximations to the 
       principal real-valued branch of the Lambert W function", Advances
       in Computational Mathematics, Vol. 43, No. 6, December 2017, pp. 
       1403-1436
    */
    for (i = 0; i < 3; i++) {
        t = w / (1.0 + w);
        w = fma (log_pos_normal (z / w), t, t);
    }

    /* Fine tune approximation with a single Newton iteration */
    e = exp_scale_pos_normal (w, -3); // 0.125 * exp (w)
    num = fma (w, e, -0.125 *z);
    den = fma (w, e, e);
    rden = 1.0 / den;
    w = fma (-num, rden, w);

    return w;
}

int double2hiint (double a)
{
    unsigned long long int t;
    memcpy (&t, &a, sizeof t);
    return (int)(t >> 32);
}

int double2loint (double a)
{
    unsigned long long int t;
    memcpy (&t, &a, sizeof t);
    return (int)(unsigned int)t;
}

double hiloint2double (int hi, int lo) 
{
    double r;
    unsigned long long int t;
    t = ((unsigned long long int)(unsigned int)hi << 32) | (unsigned int)lo;
    memcpy (&r, &t, sizeof r);
    return r;
}

/* exp(a) * 2**scale; pos. normal results only! Max. err. found: 0.89028 ulp */
double exp_scale_pos_normal (double a, int scale)
{
    const double ln2_hi = 6.9314718055829871e-01; // 0x1.62e42fefa00000p-01
    const double ln2_lo = 1.6465949582897082e-12; // 0x1.cf79abc9e3b3a0p-40
    const double l2e = 1.4426950408889634; // 0x1.71547652b82fe0p+00 // log2(e)
    const double cvt = 6755399441055744.0; // 0x1.80000000000000p+52 // 3*2**51
    double f, r;
    int i;

    /* exp(a) = exp(i + f); i = rint (a / log(2)) */
    r = fma (l2e, a, cvt);
    i = double2loint (r);
    r = r - cvt;
    f = fma (r, -ln2_hi, a);
    f = fma (r, -ln2_lo, f);

    /* approximate r = exp(f) on interval [-log(2)/2,+log(2)/2] */
    r =            2.5022018235176802e-8;  // 0x1.ade0000000000p-26
    r = fma (r, f, 2.7630903497145818e-7); // 0x1.28af3fcbbf09bp-22
    r = fma (r, f, 2.7557514543490574e-6); // 0x1.71dee623774fap-19
    r = fma (r, f, 2.4801491039409158e-5); // 0x1.a01997c8b50d7p-16
    r = fma (r, f, 1.9841269589068419e-4); // 0x1.a01a01475db8cp-13
    r = fma (r, f, 1.3888888945916566e-3); // 0x1.6c16c1852b805p-10
    r = fma (r, f, 8.3333333334557735e-3); // 0x1.11111111224c7p-7
    r = fma (r, f, 4.1666666666519782e-2); // 0x1.55555555502a5p-5
    r = fma (r, f, 1.6666666666666477e-1); // 0x1.5555555555511p-3
    r = fma (r, f, 5.0000000000000122e-1); // 0x1.000000000000bp-1
    r = fma (r, f, 1.0000000000000000e+0); // 0x1.0000000000000p+0
    r = fma (r, f, 1.0000000000000000e+0); // 0x1.0000000000000p+0

    /* exp(a) = 2**(i+scale) * r */
    r = hiloint2double (double2hiint (r) + ((i + scale) << 20), 
                        double2loint (r));
    return r;
}

/* compute natural logarithm of positive normals; max. err. found: 0.86902 ulp*/
double log_pos_normal (double a)
{
    const double ln2_hi = 6.9314718055994529e-01; // 0x1.62e42fefa39efp-01
    const double ln2_lo = 2.3190468138462996e-17; // 0x1.abc9e3b39803fp-56
    double m, r, i, s, t, p, q;
    int e;

    /* log(a) = log(m * 2**i) = i * log(2) + log(m) */
    e = (double2hiint (a) - double2hiint (0.70703125)) & 0xfff00000;
    m = hiloint2double (double2hiint (a) - e, double2loint (a));
    t = hiloint2double (0x41f00000, 0x80000000 ^ e);
    i = t - (hiloint2double (0x41f00000, 0x80000000));

    /* m now in [181/256, 362/256]. Compute q = (m-1) / (m+1) */
    p = m + 1.0;
    r = 1.0 / p;
    q = fma (m, r, -r);
    m = m - 1.0;

    /* compute (2*atanh(q)/q-2*q) as p(q**2), q in [-75/437, 53/309] */
    s = q * q;
    r =            1.4794533702196025e-1;  // 0x1.2efdf700d7135p-3
    r = fma (r, s, 1.5314187748152339e-1); // 0x1.39a272db730f7p-3
    r = fma (r, s, 1.8183559141306990e-1); // 0x1.746637f2f191bp-3
    r = fma (r, s, 2.2222198669309609e-1); // 0x1.c71c522a64577p-3
    r = fma (r, s, 2.8571428741489319e-1); // 0x1.24924941c9a2fp-2
    r = fma (r, s, 3.9999999999418523e-1); // 0x1.999999998006cp-2
    r = fma (r, s, 6.6666666666667340e-1); // 0x1.5555555555592p-1
    r = r * s;

    /* log(a) = 2*atanh(q) + i*log(2) = ln2_lo*i + p(q**2)*q + 2q + ln2_hi * i.
       Use K.C. Ng's trick to improve the accuracy of the computation, like so:
       p(q**2)*q + 2q = p(q**2)*q + q*t - t + m, where t = m**2/2.
    */
    t = m * m * 0.5;
    r = fma (q, t, fma (q, r, ln2_lo * i)) - t + m;
    r = fma (ln2_hi, i, r);

    return r;
}

Recommended topics

Hot tags