Log2 approximation in fixed-point
Asked Answered
S

1

8

I'v already implemented fixed-point log2 function using lookup table and low-order polynomial approximation but not quite happy with accuracy across the entire 32-bit fixed-point range [-1,+1). The input format is s0.31 and the output format is s15.16.

I'm posting this question here so that another user can post his answer (some comments were exchanged in another thread but they prefer to provide comprehensive answer in a separate thread: Fixed point approximation of 2^x, with input range of s5.26).

So the question is: what is a good way to implement fixed-point log2 function with approximation error better than 1e-5?

Thanks.

Scrummage answered 13/2, 2019 at 1:24 Comment(6)
What kind of accuracy are you achieving right now and what accuracy do you hope to achieve?Warble
Originally speed was very importantly to me so I only used first order approximation. I don't recall the average error across the entire range but the max relative errr is about 5%. The assembly implementation is very fast now so I'm willing to improve the accuracy at the cost of more clock cycles. I was just curious to learn your method and understand where I can draw accuracy vs speed line for my application.Scrummage
What exactly is the question? (I can guess, but phrasing it as such would be helpful, thnx.)Mince
@JohnMcFarlane, sorry for any confusion. I asked njuffa in another thread if he has done minmax polynomial approximation for fixed-point implementation of log2 function and he suggested to open up a new question so he can offer his solution.Scrummage
@Scrummage I don't see a link to the other thread and you have not yet asked any question in this one. Perhaps: "what is a good way to implement log2 approximation using fixed-point?"Mince
@JohnMcFarlane, sorry it's been almost 5 years that you posted your comment and I never noticed that, my apologies. Added the link to the description.Scrummage
W
8

By simply counting the leading zero bits in a fixed-point number x, one can determine log2(x) to the closest strictly smaller integer. On many processor architectures, there is a "count leading zeros" machine instruction or intrinsic. Where this is not available, a fairly efficient implementation of clz() can be constructed in a variety of ways, one of which is included in the code below.

To compute the fractional part of the logarithm, the two main obvious contenders are interpolation in a table and minimax polynomial approximation. In this specific case, quadratic interpolation in a fairly small table seems to be the more attractive option. x = 2i * (1+f), with 0 ≤ f < 1. We determine i as described above and use the leading bits of f to index into the table. A parabola is fit through this and two following table entries, computing the parameters of the parabola on the fly. The result is rounded, and a heuristic adjustment is applied to partially compensate for the truncating nature of fixed-point arithmetic. Finally, the integer portion is added, yielding the final result.

It should be noted that the computation involves right shifts of signed integers which may be negative. We need those right shifts to map to arithmetic right shifts at machine code level, something which is not guaranteed by the ISO-C standard. However, in practice most compilers do what is desired. In this case I used the Intel compiler on an x64 platform running Windows.

With a 66-entry table of 32-bit words, the maximum absolute error can be reduced to 8.18251e-6, so full s15.16 accuracy is achieved.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>

#define FRAC_BITS_OUT (16)
#define INT_BITS_OUT  (15)
#define FRAC_BITS_IN  (31)
#define INT_BITS_IN   ( 0)

/* count leading zeros: intrinsic or machine instruction on many architectures */
int32_t clz (uint32_t x)
{
    uint32_t n, y;

    n = 31 + (!x);
    if ((y = (x & 0xffff0000U))) { n -= 16;  x = y; }
    if ((y = (x & 0xff00ff00U))) { n -=  8;  x = y; }
    if ((y = (x & 0xf0f0f0f0U))) { n -=  4;  x = y; }
    if ((y = (x & 0xccccccccU))) { n -=  2;  x = y; }
    if ((    (x & 0xaaaaaaaaU))) { n -=  1;         }
    return n;
}

#define LOG2_TBL_SIZE (6)
#define TBL_SIZE      ((1 << LOG2_TBL_SIZE) + 2)

/* for i = [0,65]: log2(1 + i/64) * (1 << 31) */
const uint32_t log2Tab [TBL_SIZE] =
{
    0x00000000, 0x02dcf2d1, 0x05aeb4dd, 0x08759c50, 
    0x0b31fb7d, 0x0de42120, 0x108c588d, 0x132ae9e2, 
    0x15c01a3a, 0x184c2bd0, 0x1acf5e2e, 0x1d49ee4c, 
    0x1fbc16b9, 0x22260fb6, 0x24880f56, 0x26e2499d, 
    0x2934f098, 0x2b803474, 0x2dc4439b, 0x30014ac6, 
    0x32377512, 0x3466ec15, 0x368fd7ee, 0x38b25f5a, 
    0x3acea7c0, 0x3ce4d544, 0x3ef50ad2, 0x40ff6a2e, 
    0x43041403, 0x450327eb, 0x46fcc47a, 0x48f10751, 
    0x4ae00d1d, 0x4cc9f1ab, 0x4eaecfeb, 0x508ec1fa, 
    0x5269e12f, 0x5440461c, 0x5612089a, 0x57df3fd0, 
    0x59a80239, 0x5b6c65aa, 0x5d2c7f59, 0x5ee863e5, 
    0x60a02757, 0x6253dd2c, 0x64039858, 0x65af6b4b, 
    0x675767f5, 0x68fb9fce, 0x6a9c23d6, 0x6c39049b, 
    0x6dd2523d, 0x6f681c73, 0x70fa728c, 0x72896373, 
    0x7414fdb5, 0x759d4f81, 0x772266ad, 0x78a450b8, 
    0x7a231ace, 0x7b9ed1c7, 0x7d17822f, 0x7e8d3846, 
    0x80000000, 0x816fe50b
};

#define RND_SHIFT     (31 - FRAC_BITS_OUT)
#define RND_CONST     ((1 << RND_SHIFT) / 2)
#define RND_ADJUST    (0x10d) /* established heuristically */

/* 
   compute log2(x) in s15.16 format, where x is in s0.31 format
   maximum absolute error 8.18251e-6 @ 0x20352845 (0.251622232)
*/   
int32_t fixed_log2 (int32_t x)
{
    int32_t f1, f2, dx, a, b, approx, lz, i, idx;
    uint32_t t;

    /* x = 2**i * (1 + f), 0 <= f < 1. Find i */
    lz = clz (x);
    i = INT_BITS_IN - lz;
    /* normalize f */
    t = (uint32_t)x << (lz + 1);
    /* index table of log2 values using LOG2_TBL_SIZE msbs of fraction */
    idx = t >> (32 - LOG2_TBL_SIZE);
    /* difference between argument and smallest sampling point */
    dx = t - (idx << (32 - LOG2_TBL_SIZE));
    /* fit parabola through closest three sampling points; find coeffs a, b */
    f1 = (log2Tab[idx+1] - log2Tab[idx]);
    f2 = (log2Tab[idx+2] - log2Tab[idx]);
    a = f2 - (f1 << 1);
    b = (f1 << 1) - a;
    /* find function value for argument by computing ((a*dx+b)*dx) */
    approx = (int32_t)((((int64_t)a)*dx) >> (32 - LOG2_TBL_SIZE)) + b;
    approx = (int32_t)((((int64_t)approx)*dx) >> (32 - LOG2_TBL_SIZE + 1));
    approx = log2Tab[idx] + approx;
    /* round fractional part of result */
    approx = (((uint32_t)approx) + RND_CONST + RND_ADJUST) >> RND_SHIFT;
    /* combine integer and fractional parts of result */
    return (i << FRAC_BITS_OUT) + approx;
}

/* convert from s15.16 fixed point to double-precision floating point */
double fixed_to_float_s15_16 (int32_t a)
{
    return a / 65536.0;
}

/* convert from s0.31 fixed point to double-precision floating point */
double fixed_to_float_s0_31 (int32_t a)
{
    return a / (65536.0 * 32768.0);
}

int main (void)
{
    double a, res, ref, err, maxerr = 0.0;
    int32_t x, start, end;

    start = 0x00000001;
    end =   0x7fffffff;
    printf ("testing fixed_log2 with inputs in [%17.10e, %17.10e)\n",  
            fixed_to_float_s0_31 (start), fixed_to_float_s0_31 (end));

    for (x = start; x < end; x++) {
        a = fixed_to_float_s0_31 (x);
        ref = log2 (a);
        res = fixed_to_float_s15_16 (fixed_log2 (x));
        err = fabs (res - ref);
        if (err > maxerr) {
            maxerr = err;
        }
    }

    printf ("max. err = %g\n", maxerr);
    return EXIT_SUCCESS;
}

For completeness, I am showing the minimax polynomial approximation below. The coefficients for such approximations can be generated by several tools such as Maple, Mathematica, Sollya or with homebrew code using the Remez algorithm, which is what I used here. The code below shows the original floating-point coefficients, the dynamic scaling used to maximize accuracy in intermediate computation, and the heuristic adjustments applied to mitigate the impact of non-rounding fixed-point arithmetic.

A typical approach for computation of log2(x) is to use x = 2i * (1+f) and use approximation of log2(1+f) for (1+f) in [√½, √2], which means that we use a polynomial p(f) on the primary approximation interval [√½-1, √2-1].

The intermediate computation scales up operands as far as feasible for improved accuracy under the restriction that we want to use a 32-bit mulhi operation as its basic building block, as this is a native instruction on many 32-bit architectures, accessible either via inline machine code or as an intrinsic. As in the table-based code, there are right shifts of signed data which may be negative, and such right shifts must map to arithmetic right shifts, something that ISO-C doesn't guarantee but most C compilers do.

I managed to get the maximum absolute error for this variant down to 1.11288e-5, so almost full s15.16 accuracy but slightly worse than for the table-based variant. I suspect I should have added one additional term to the polynomial.

/* on 32-bit architectures, there is often an instruction/intrinsic for this */
int32_t mulhi (int32_t a, int32_t b)
{
    return (int32_t)(((int64_t)a * (int64_t)b) >> 32);
}

#define RND_SHIFT  (25 - FRAC_BITS_OUT)
#define RND_CONST  ((1 << RND_SHIFT) / 2)
#define RND_ADJUST (-2) /* established heuristically */

/* 
    compute log2(x) in s15.16 format, where x is in s0.31 format
    maximum absolute error 1.11288e-5 @ 0x5a82689f (0.707104757)
*/   
int32_t fixed_log2 (int32_t x)
{
    int32_t lz, i, f, p, approx;
    uint32_t t;
    /* x = 2**i * (1 + f), 0 <= f < 1. Find i */
    lz = clz (x);
    i = INT_BITS_IN - lz;
    /* force (1+f) into range [sqrt(0.5), sqrt(2)] */
    t = (uint32_t)x << lz;    
    if (t > (uint32_t)(1.414213562 * (1U << 31))) {
        i++;
        t = t >> 1;
    }
    /* compute log2(1+f) for f in [-0.2929, 0.4142] */
    f = t - (1U << 31);
    p =              + (int32_t)(-0.206191055 * (1U << 31) -  1);
    p = mulhi (p, f) + (int32_t)( 0.318199910 * (1U << 30) - 18);
    p = mulhi (p, f) + (int32_t)(-0.366491705 * (1U << 29) + 22);
    p = mulhi (p, f) + (int32_t)( 0.479811855 * (1U << 28) -  2);
    p = mulhi (p, f) + (int32_t)(-0.721206390 * (1U << 27) + 37);
    p = mulhi (p, f) + (int32_t)( 0.442701618 * (1U << 26) + 35);
    p = mulhi (p, f) + (f >> (31 - 25));
    /* round fractional part of the result */
    approx = (p + RND_CONST + RND_ADJUST) >> RND_SHIFT;
    /* combine integer and fractional parts of result */
    return (i << FRAC_BITS_OUT) + approx;
}
Warble answered 13/2, 2019 at 6:25 Comment(5)
Such an awesome answer! In case anyone's wondering what is going on with the parabola fitting, op uses Lagrange Interpolation Formula, but with a neat trick of translating the function to (0,0) and treating (0,0) as the first point. There's also a sneaky division by two (in the second approx statement), in case your numbers don't check out.Patchwork
@gwiazdorr I have been carrying forward this approach since 1987 or so, and I am pretty sure I had no idea about Lagrange interpolation at the time. After placing the origin of the interpolation coordinate system at the first point, and defining the x-axis distance between the equidistant nodes as unity, by plugging the other two points into ax²+bx we get a system of two simultaneous equations: a+b = f1 and 4*a+2*b = f2. This gives us 2*a = f2 - 2*f1 and 2*b = 4*f1 - f2 = 2*f1 - 2*a.Warble
@Warble : aren't expressions like these (32 - LOG2_TBL_SIZE), (int32_t)( 0.318199910 * (1U << 30) - 18), (uint32_t)(1.414213562 * (1U << 31)) etc be of constant value ? why are they being re-calculated every time ?Tautologize
@RAREKpopManifesto The compiler will take care in converting them into magic constants when optimizing, while the source code shows how these magic constants were actually constructed (i.e. it improves self-documentation).Warble
@RAREKpopManifesto Here is a worked example at Compiler Explorer, with fixed_log2() compiled for a 32-bit ARM platform: godbolt.org/z/8aMEqqvK8Warble

© 2022 - 2025 — McMap. All rights reserved.