Avoid Overflow when Calculating π by Evaluating a Series Using 16-bit Arithmetic?

Asked 7/5, 2019 at 12:25 Answered 8/5, 2019 at 7:7

c algorithm fixed-point mathematical-expressions numerical-computing

I'm trying to write a program that calculates decimal digits of π to 1000 digits or more.

To practice low-level programming for fun, the final program will be written in assembly, on a 8-bit CPU that has no multiplication or division, and only performs 16-bit additions. To ease the implementation, it's desirable to be able to use only 16-bit unsigned integer operations, and use an iterative algorithm. Speed is not a major concern. And fast multiplication and division is beyond the scope of this question, so don't consider those issues as well.

Before implementing it in assembly, I'm still trying to figure out an usable algorithm in C on my desktop computer. So far, I found the following series is reasonably efficient and relatively easy to implement.

The formula is derived from the Leibniz Series using a convergence acceleration technique, To derive it, see Computing the Digits in π, by Carl D. Offner (https://cs.umb.edu/~offner/files/pi.pdf), page 19-26. The final formula is shown in page 26. The initial formula I've written had some typos, please refresh the page to see the fixed formula. The constant term 2 at the greatest term is explained in page 54. The paper described an advanced iterative algorithm as well, but I didn't use it here.

If one evaluates the series using many (e.g. 5000) terms, it's possible to get thousands digits of π easily, and I found this series is easy to evaluate iteratively as well using this algorithm:

Algorithm

First, rearrange the formula to obtain its constant terms from an array.

Fill the array with 2 to start the first iteration, hence the new formula resembles the original one.
Let carry = 0.
Start from the greatest term. Obtain one term (2) from the array, multiply the term by PRECISION to perform a fixed-point division against 2 * i + 1, and save the reminder as the new term to the array. Then add the next term. Now decrement i, go to the next term, repeat until i == 1. Finally add the final term x_0.
Because 16-bit integer is used, PRECISION is 10, hence 2 decimal digits are obtained, but only the first digit is valid. Save the second digit as carry. Show the first digit plus carry.
x_0 is the integer 2, it should not be added for the successive iterations, clear it.
Goto step 4 to calculate the next decimal digit, until we have all the digits we want.

Implementation 1

Translating this algorithm to C:

#include <stdio.h>
#include <stdint.h>

#define N 2160
#define PRECISION 10

uint16_t terms[N + 1] = {0};

int main(void)
{
    /* initialize the initial terms */
    for (size_t i = 0; i < N + 1; i++) {
        terms[i] = 2;
    }

    uint16_t carry = 0;
    for (size_t j = 0; j < N / 4; j++) {
        uint16_t numerator = 0;
        uint16_t denominator;
        uint16_t digit;

        for (size_t i = N; i > 0; i--) {
            numerator += terms[i] * PRECISION;
            denominator = 2 * i + 1;

            terms[i] = numerator % denominator;
            numerator /= denominator;
            numerator *= i;
        }
        numerator += terms[0] * PRECISION;
        digit = numerator / PRECISION + carry;
        carry = numerator % PRECISION;

        printf("%01u", digit);

        /* constant term 2, only needed for the first iteration. */
        terms[0] = 0;
    }
    putchar('\n');
}

The code can calculate π to 31 decimal digits, until it makes an error.

31415926535897932384626433832794
10 <-- wrong

Sometimes digit + carry is greater than 9, so it needs an extra carry. If we are very unlucky, there may even be a double carry, triple carry, etc. We use a ring-buffer to store the last 4 digits. If an extra carry is detected, we output a backspace to erase the previous digit, perform a carry, and reprint them. This is just a ugly solution to the Proof-of-Concept, which is irrelevant to my question about overflow, but for completeness, here is it. Something better would be implemented in the future.

Implementation 2 with Repeated Carry

#include <stdio.h>
#include <stdint.h>

#define N 2160
#define PRECISION 10

#define BUF_SIZE 4

uint16_t terms[N + 1] = {0};

int main(void)
{
    /* initialize the initial terms */
    for (size_t i = 0; i < N + 1; i++) {
        terms[i] = 2;
    }

    uint16_t carry = 0;
    uint16_t digit[BUF_SIZE];
    int8_t idx = 0;

    for (size_t j = 0; j < N / 4; j++) {
        uint16_t numerator = 0;
        uint16_t denominator;

        for (size_t i = N; i > 0; i--) {
            numerator += terms[i] * PRECISION;
            denominator = 2 * i + 1;

            terms[i] = numerator % denominator;
            numerator /= denominator;
            numerator *= i;
        }
        numerator += terms[0] * PRECISION;
        digit[idx] = numerator / PRECISION + carry;

        /* over 9, needs at least one carry op. */
        if (digit[idx] > 9) {
            for (int i = 1; i <= 4; i++) {
                if (i > 3) {
                    /* allow up to 3 consecutive carry ops */
                    fprintf(stderr, "ERROR: too many carry ops!\n");
                    return 1;
                }
                /* erase a digit */
                putchar('\b');

                /* carry */
                digit[idx] -= 10;
                idx--;
                if (idx < 0) {
                    idx = BUF_SIZE - 1;
                }
                digit[idx]++;            
                if (digit[idx] < 10) {
                    /* done! reprint the digits */
                    for (int j = 0; j <= i; j++) {
                        printf("%01u", digit[idx]);
                        idx++;
                        if (idx > BUF_SIZE - 1) {
                            idx = 0;
                        }
                    }
                    break;
                }
            }
        }
        else {
            printf("%01u", digit[idx]);
        }

        carry = numerator % PRECISION;
        terms[0] = 0;

        /* put an element to the ring buffer */
        idx++;
        if (idx > BUF_SIZE - 1) {
            idx = 0;
        }
    }
    putchar('\n');
}

Great, now the program can correctly calculate 534 digits of π, until it makes an error.

3141592653589793238462643383279502884
1971693993751058209749445923078164062
8620899862803482534211706798214808651
3282306647093844609550582231725359408
1284811174502841027019385211055596446
2294895493038196442881097566593344612
8475648233786783165271201909145648566
9234603486104543266482133936072602491
4127372458700660631558817488152092096
2829254091715364367892590360011330530
5488204665213841469519415116094330572
7036575959195309218611738193261179310
5118548074462379962749567351885752724
8912279381830119491298336733624406566
43086021394946395
22421 <-- wrong

16-bit Integer Overflow

It turns out, during the calculation of the largest terms at the beginning, the error term gets quite large, since the divisors at the beginning are in the range of ~4000. When evaluating the series, numerator actually starts to overflow in the multiplication immediately.

The integer overflow is insignificant when calculating the first 500 digits, but starts to get worse and worse, until it gives an incorrect result.

Changing uint16_t numerator = 0 to uint32_t numerator = 0 can solve this problem and calculate π to 1000+ digits.

However, as I mentioned before, my target platform is a 8-bit CPU, and only has 16-bit operations. Is there a trick to solve the 16-bit integer overflow issue that I'm seeing here, using only one or more uint16_t? If it's not possible to avoid multiple-precision arithmetic, what is the simplest method to implement it here? I know somehow I need to introduce an extra 16-bit "extension word", but I'm not sure how can I implement it.

And thanks in advance for your patience to understand the long context here.

Selfdevotion answered 7/5, 2019 at 12:25 Comment(7)

You can look at what Abseil did to implement uint128 with uint64s, since it's the same basic ideas: github.com/abseil/abseil-cpp/blob/master/absl/numeric/int128.h and github.com/abseil/abseil-cpp/blob/master/absl/numeric/int128.cc – Kipper 7/5, 2019 at 12:47

@Selfdevotion Where did you get this Pi calculation formula? – Townsman 7/5, 2019 at 12:58

@Townsman The formula is derived from the Leibniz Series using a convergence acceleration technique, To derive it, see Computing the Digits in π, by Carl D. Offner (cs.umb.edu/~offner/files/pi.pdf), page 19-26. The final formula is shown in page 26. The constant term 2 at the greatest term is explained in page 54. My initial formula I've written had some typos, please refresh the page to see the fixed formula. The paper described an advanced iterative algorithm as well, but I didn't use it here. – Dacosta 7/5, 2019 at 13:3

@Selfdevotion Your formula can be found in Wikipedia How to calculate Pi from arctan . – Townsman 7/5, 2019 at 13:14

Do you want to detect overflow or compute despite overflows ? (in the latter case, I don't think you can avoid multiple-precision arithmetic). – Monkhood 7/5, 2019 at 13:34

@YvesDaoust I've clarified my question. The goal is to compute despite overflows. The point is: there is lots of materials on how to write a "full-brown" general-purpose arbitrary-precision arithmetic library that seems to be overkill in this example. I'm not sure what is the most suitable technique to overcome this very specific problem. – Dacosta 7/5, 2019 at 13:40

You probably need "double" precision. – Monkhood 7/5, 2019 at 13:43

Take a look at related QA:

Baking-Pi Challenge - Understanding & Improving

Its using Wiki: Bailey–Borwein–Plouffe_formula which is more suited for integer arithmetics.

The real challenge however would be:

How do I convert a very long binary number to decimal?.

As you probably want to print the number in dec base ...

Also if you need carry in higher level language than asm take a look at this:

Cant make value propagate through carry

You can modify it to handle as many carry bits as you need (if still less than the data type bit-width).

[Edit1] BBP example in C++/VCL

I used this formula (taken from Wiki page linked above):

converted to fixed point...

//---------------------------------------------------------------------------
AnsiString str_hex2dec(const AnsiString &hex)
    {
    char c;
    AnsiString dec="",s;
    int i,j,l,ll,cy,val;
    int  i0,i1,i2,i3,sig;
    sig=+1; l=hex.Length();
    if (l) { c=hex[l]; if (c=='h') l--; if (c=='H') l--; }
    i0=0; i1=l; i2=0; i3=l;
    for (i=1;i<=l;i++)      // scan for parts of number
        {
        char c=hex[i];
        if (c=='-') sig=-sig;
        if ((c=='.')||(c==',')) i1=i-1;
        if ((c>='0')&&(c<='9')) { if (!i0) i0=i; if ((!i2)&&(i>i1)) i2=i; }
        if ((c>='A')&&(c<='F')) { if (!i0) i0=i; if ((!i2)&&(i>i1)) i2=i; }
        if ((c>='a')&&(c<='f')) { if (!i0) i0=i; if ((!i2)&&(i>i1)) i2=i; }
        }

    l=0; s=""; if (i0) for (i=i0;i<=i1;i++)
        {
        c=hex[i];
             if ((c>='0')&&(c<='9')) c-='0';
        else if ((c>='A')&&(c<='F')) c-='A'-10;
        else if ((c>='a')&&(c<='f')) c-='A'-10;
        for (cy=c,j=1;j<=l;j++)
            {
            val=(s[j]<<4)+cy;
            s[j]=val%10;
            cy  =val/10;
            }
        while (cy>0)
            {
            l++;
            s+=char(cy%10);
            cy/=10;
            }
        }
    if (s!="")
        {
        for (j=1;j<=l;j++) { c=s[j]; if (c<10) c+='0'; else c+='A'-10; s[j]=c; }
        for (i=l,j=1;j<i;j++,i--) { c=s[i]; s[i]=s[j]; s[j]=c; }
        dec+=s;
        }
    if (dec=="") dec="0";
    if (sig<0) dec="-"+dec;

    if (i2)
        {
        dec+='.';
        s=hex.SubString(i2,i3-i2+1);
        l=s.Length();
        for (i=1;i<=l;i++)
            {
            c=s[i];
                 if ((c>='0')&&(c<='9')) c-='0';
            else if ((c>='A')&&(c<='F')) c-='A'-10;
            else if ((c>='a')&&(c<='f')) c-='A'-10;
            s[i]=c;
            }
        ll=((l*1234)>>10);  // num of decimals to compute
        for (cy=0,i=1;i<=ll;i++)
            {
            for (cy=0,j=l;j>=1;j--)
                {
                val=s[j];
                val*=10;
                val+=cy;
                s[j]=val&15;
                cy=val>>4;
                }
            dec+=char(cy+'0');
            for (;;)
                {
                if (!l) break;;
                if (s[l]) break;
                l--;
                }
            if (!l) break;;
            }
        }

    return dec;
    }
//---------------------------------------------------------------------------
AnsiString pi_BBP() // https://en.wikipedia.org/wiki/Bailey–Borwein–Plouffe_formula
    {
    const int N=100;        // 32*N bit uint arithmetics
    int sh;
    AnsiString s;
    uint<N> pi,a,b,k,k2,k3,k4;

    for (pi=0,sh=(N<<5)-8,k=0;sh>=0;k++,sh-=4)
        {
        k2=k*k;
        k3=k2*k;
        k4=k3*k;
        a =k2* 120;
        a+=k * 151;
        a+=     47;
        b =k4* 512;
        b+=k3*1024;
        b+=k2* 712;
        b+=k * 194;
        b+=     15;
        a<<=sh;
        pi+=a/b;
        }
    pi<<=4;
    s=pi.strhex();
    s=s.Insert(".",2);
    return str_hex2dec(s);
    }
//---------------------------------------------------------------------------

The code is using VCL AnsiString which is a self allocating string and mine uint<N> template which is unsigned integer arithmetics of 32*N bitwidth based on mine ALU32. As you can see you only need big integer division addition and multiplication for this (all the other stuff is doable on normal integers).

Here decadic result versus 1000 digit Pi reference:

ref: 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095505822317253594081284811174502841027019385211055596446229489549303819644288109756659334461284756482337867831652712019091456485669234603486104543266482133936072602491412737245870066063155881748815209209628292540917153643678925903600113305305488204665213841469519415116094330572703657595919530921861173819326117931051185480744623799627495673518857527248912279381830119491298336733624406566430860213949463952247371907021798609437027705392171762931767523846748184676694051320005681271452635608277857713427577896091736371787214684409012249534301465495853710507922796892589235420199561121290219608640344181598136297747713099605187072113499999983729780499510597317328160963185950244594553469083026425223082533446850352619311881710100031378387528865875332083814206171776691473035982534904287554687311595628638823537875937519577818577805321712268066130019278766111959092164201989
BPP: 3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294895493038196442881097566593344612847564823378678316527120190914564856692346034861045432664821339360726024914127372458700660631558817488152092096282925409171536436789259036001133053054882046652138414695194151160943305727036575959195309218611738193261179310511854807446237996274956735188575272489122793818301194912983367336244065664308602139494639522473719070217986094370277053921717629317675238467481846766940513200056812714526356082778577134275778960917363717872146844090122495343014654958537105079227968925892354201995611212902196086403441815981362977477130996051870721134999999837297804995105973173281609631859502445945534690830264252230825334468503526193118817101000313783875288658753320838142061717766914730359825349042875546873115956286388235378759375195778185778048187

The computed bigint value is exported to hex string and then converted to decadic base using str_hex2dec from link above. The number of iterations depends on the target bitwidth.

The code is not optimized yet...

Furbelow answered 8/5, 2019 at 7:7 Comment(1)

Thanks, so digit extraction is not the only feature of BBP, it's also easier to code in integer arithmetic for conventional computation. I should definitely try it some day. – Dacosta 8/5, 2019 at 14:10

There is a trick:

Consider using an array for the numerators and another array for the denominators. Each position would represent the number of times that number is multiplied to get the actual number.

An example:

(1 * 2 * 3 * 7 * 7) / (3 * 6 * 8)

Would be represented as:

num[] = {1, 1, 1, 0, 0, 0, 2};
denom[] = {0, 0, 1, 0, 0, 1, 0, 1};

Then consider factorizing into prime numbers every number before storing it, so that you have lower numbers. Now you will need another array to store all the primes:

primes[] = {2, 3, 5, 7};
num[] = {1, 1, 0, 2};
denom[] = {4, 2, 0, 0};

This will allow you to store unimaginably big numbers, but you will sooner or later want to transform them back into numbers, so you will want to simplify this first. The way to do it is just subtract factors[i] += num[i] - denom[i] for every field in the arrays, for every fraction in the series. You will want to simplify after each iteration, so you minimize overflow risk.

factors[] = {-3, -1, 0, 2};

When you need the number, just do num *= pow(primes[i], factors[i]); if the factor is positive, or num /= pow(primes, -factors[i]); if it is negative, for every field in the arrays. (Do nothing if it is 0.

num and denom are temporary arrays used to store a fraction, the array where the result is being stored is factors. Remember to memset the temporary arrays before every use.

This explanation is useful for any big fraction. To adapt it to your specific problem, you may need to use an integer power function, and also multiply by 10^something to turn the decimal part into an integral part. That is your mission, should you accept it :)

Organology answered 7/5, 2019 at 13:3 Comment(14)

I have a library where I use this trick, not for pi, but for big numbers such as binomial coefficients and others. You may want to have a look at it: github.com/alejandro-colomar/libalx – Organology 7/5, 2019 at 13:7

How is this in any way better than just using two bignums as numerator and denominator and implementing rational arithmetic via Euclid's algorithm? – Nicollenicolson 7/5, 2019 at 13:16

@EOF This doesn't need any bignum. As he asked, he needs int16_t. I think performance could (and only could) be better. – Organology 7/5, 2019 at 13:22

How is this better than bignum? You're going to need two large arrays of numbers in your solution anyway, and they will be larger than a conventional bignum. – Nicollenicolson 7/5, 2019 at 13:25

I use more space, that is true. But I use integer addition and subtraction basically (only pow and multiplication and division in the end). Could be way faster because of that. With bignum you would be multiplying after each iteration, which could be really slow in comparison. – Organology 7/5, 2019 at 13:28

Given that you need an array element for every prime number less than or equal to the represented number, and every array element must be able to represent numbers up to log2(represented number), you need a lot of extra space. Since the question is explicitly about an 8-bit machine, and says "Speed is not a major concern", I fail to see how your answer could possibly be useful. – Nicollenicolson 7/5, 2019 at 13:34

@EOF There are only 3512 prime numbers that fit into an int16_t, so the arrays are not that big (it depends on the system, obviously). That would allow for him to do approximately INT16_MAX / 2 iterations, which is more than the 5000 he put as an example. – Organology 7/5, 2019 at 13:42

@EOF And, 2^INT_MAX is way above any number he may find in the process, but he should still simplify after each iteration to minimize that risk. – Organology 7/5, 2019 at 13:47

I fail to see how int16_t is relevant here. The entire problem is that there are intermediate results that exceed the range of int16_t. Even if the range of int16_t were the limit, storing a number in the format you propose requires at least about 10% of a 16-bit address space, to say nothing of the actual memory a typical 8-bit machine will have available. – Nicollenicolson 7/5, 2019 at 13:51

Let's say you want 5000 iterations as he says: the largest number you will find is 10000. There are only 1229 prime numbers less than that. You will need 4 arrays in the process, so you will use 4*1229*2 bytes. Then it is up to the user to decide if he has that much memory available. I guarantee no overflow, no need of numbers bigger than int16_t, of course no bignum library available, and maybe even some performance. – Organology 7/5, 2019 at 14:0

Implementing a bignum library is a lot simpler, so whether one is already available is irrelevant. Also, since you require factorizing numbers, I highly doubt that this will actually be fast. Post a benchmark and we'll talk again. – Nicollenicolson 7/5, 2019 at 14:4

@EOF Sorry for being one month late, but I had exams. I did the benchmark, and you were right. I calculated binomial coefficients ((n choose k)) of random numbers (the same random numbers in both tests) instead of pi (so that I could use my old code, and didn't have to write the code for pi). My solution is two orders slower (100x slower on average) on the tests I did. The results oscillate between 500x and 5x. – Organology 4/6, 2019 at 2:59

I like that you actually tested it, and I highly value your intellectual honesty of publicly admitting that your original proposal is not efficient. Thank you! – Nicollenicolson 4/6, 2019 at 14:57

After optimizing the factorization process to the maximum, my solution is around 10x slower than GMP on average (between 2x and 20x). – Organology 5/6, 2019 at 19:39

What about implementing 32 bits arithmetic ?

For an addition, add the two high order words (16 bits), then the two low order words, test the overflow bit, and carry to the high order result if necessary.

If you can predict when overflow will occur, you can switch from 16 to 32 bits arithmetic when necessary.

Testing the overflow bit cannot be done in pure C, it will require some inline assembly or an intrinsic function.

Otherwise, you can be inspired by this answer: https://codereview.stackexchange.com/a/37178/39646

Monkhood answered 7/5, 2019 at 13:39 Comment(2)

Thanks. Apparently, the overflow issue cannot be avoided, as I mentioned, the final program will be written in assembly, so the most sensible solution is still using two 16-bit registers and testing for overflow via the carry flag in the traditional way. – Dacosta 7/5, 2019 at 13:45

Testing for overflow is easy: if (a + b < a) overflow(); gcc/clang are even sometimes smart enough to turn that into add-with-carry in loops. So you can always use 32 or even 64 bit arithmatic. Only for multiplications you need to use half the width because the result doubles in size. – Shiah 7/8, 2022 at 20:34

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Algorithm

Implementation 1

Implementation 2 with Repeated Carry

16-bit Integer Overflow

Recommended topics

Hot tags