Algorithm for Calculating Binomial Coefficient
Asked Answered
C

7

10

I need a way of calculating combinations without running out of memory. Here's what i have so far.

public static long combination(long n, long k) // nCk
{
    return (divideFactorials(factorial(n), ((factorial(k) * factorial((n - k))))));
}

public static long factorial(long n)
{
    long result;
    if (n <= 1) return 1;
    result = factorial(n - 1) * n;
    return result;
}

public static long divideFactorials(long numerator, long denominator)
{
    return factorial(Math.Abs((numerator - denominator)));
}

I have tagged it as C#, but the solution should ideally be language independent.

Cann answered 19/10, 2012 at 23:24 Comment(3)
These numbers are called "binomial coefficients" and as a very common object, have appeared before in SO: https://mcmap.net/q/375868/-binomial-coefficient/422353Annettannetta
What kind of combination exactly are you trying to get? Is it simply nCk, or is it something else? I am just asking because your comment at the top says "nCk" but your code does not directly calculate that.Luckey
Yes, add this line to factorial(): if (n > 20) throw new OverflowException();Nerine
A
7
public static long combination(long n, long k)
    {
        double sum=0;
        for(long i=0;i<k;i++)
        {
            sum+=Math.log10(n-i);
            sum-=Math.log10(i+1);
        }
        return (long)Math.pow(10, sum);
    }
Automobile answered 19/10, 2012 at 23:44 Comment(4)
Even though the original post uses longs, your return value should really be a double or long double. As you did the calculations using doubles, there is really no point in converting it back into a long, as the answer is not necessarily 100% exact. Also it limits your values for n and k even more.Luckey
this is not a good solution. for example, combination(52,5) should return 2598960, not 2598959. The Mark Dominus one is much better.Piss
This is bad solution. Consider combination(7,7), combination(10,10) which should be 1, not zero. This is even worse, as it yields correct 1 for combination(1,1), combination(2,2) up to seven, giving false hope. This is obwiously caused by the casting to long, and the loss of precision for some numbers. Making the function return double and removing the cast will fix this, but the precission will still be not optimal.Astroid
This would be the best solution though (when the long casting removed and result returned as double) when the numbers are greater than couple of thousends, as Mark Dominus solution overflows then.Astroid
C
23

One of the best methods for calculating the binomial coefficient I have seen suggested is by Mark Dominus. It is much less likely to overflow with larger values for N and K than some other methods.

public static long GetBinCoeff(long N, long K)
{
   // This function gets the total number of unique combinations based upon N and K.
   // N is the total number of items.
   // K is the size of the group.
   // Total number of unique combinations = N! / ( K! (N - K)! ).
   // This function is less efficient, but is more likely to not overflow when N and K are large.
   // Taken from:  http://blog.plover.com/math/choose.html
   //
   long r = 1;
   long d;
   if (K > N) return 0;
   for (d = 1; d <= K; d++)
   {
      r *= N--;
      r /= d;
   }
   return r;
}
Crocidolite answered 20/10, 2012 at 20:0 Comment(3)
Since r is always non-negative, better to use ulong instead of long to allow larger coefficients to be calculated without overflow.Cresida
Even better use BigInteger from System.Numerics. For example: GetBinCoeff(64,33) returns 1.77E18 when implemented with BigInteger (the correct answer) but 3.43E17 and -2.68E17 when implemented with ulong and long respectively.Cresida
@Cresida In standard C# usage, unsigned integers are reserved for bit fields or other special uses so don't do that. Calculations are always performed signed. But of course, feel free to use BigInteger if needed.Scriven
A
9

Here is a solution which is very similar to Bob Byran, but checks two more preconditions to speed up the code.

    /// <summary>
    /// Calculates the binomial coefficient (nCk) (N items, choose k)
    /// </summary>
    /// <param name="n">the number items</param>
    /// <param name="k">the number to choose</param>
    /// <returns>the binomial coefficient</returns>
    public static long BinomCoefficient(long n, long k)
    {
        if (k > n) { return 0; }
        if (n == k) { return 1; } // only one way to chose when n == k
        if (k > n - k) { k = n - k; } // Everything is symmetric around n-k, so it is quicker to iterate over a smaller k than a larger one.
        long c = 1;
        for (long i = 1; i <= k; i++)
        {
            c *= n--;
            c /= i;
        }
        return c;
    }
Ammons answered 1/10, 2013 at 20:27 Comment(1)
The check n == k should probably not be there. It is an optimization for a statistically rare case.Salmonberry
A
7
public static long combination(long n, long k)
    {
        double sum=0;
        for(long i=0;i<k;i++)
        {
            sum+=Math.log10(n-i);
            sum-=Math.log10(i+1);
        }
        return (long)Math.pow(10, sum);
    }
Automobile answered 19/10, 2012 at 23:44 Comment(4)
Even though the original post uses longs, your return value should really be a double or long double. As you did the calculations using doubles, there is really no point in converting it back into a long, as the answer is not necessarily 100% exact. Also it limits your values for n and k even more.Luckey
this is not a good solution. for example, combination(52,5) should return 2598960, not 2598959. The Mark Dominus one is much better.Piss
This is bad solution. Consider combination(7,7), combination(10,10) which should be 1, not zero. This is even worse, as it yields correct 1 for combination(1,1), combination(2,2) up to seven, giving false hope. This is obwiously caused by the casting to long, and the loss of precision for some numbers. Making the function return double and removing the cast will fix this, but the precission will still be not optimal.Astroid
This would be the best solution though (when the long casting removed and result returned as double) when the numbers are greater than couple of thousends, as Mark Dominus solution overflows then.Astroid
L
3

Looking at your code, it is no wonder that you will run out of memory quite fast. Your method divideFactorials calls the method factorial and uses as argument the difference "numerator-denominator". That difference is very likely going to be very large according to your code and you will be stuck in a very long loop in your factorial method.

If it is really just about finding nCk (which I assume because your comment in your code), just use:

public static long GetnCk(long n, long k)
{
    long bufferNum = 1;
    long bufferDenom = 1;

    for(long i = n; i > Math.Abs(n-k); i--)
    {
        bufferNum *= i;
    }

    for(long i = k; i => 1; i--)
    {
        bufferDenom *= i;
    }

    return (long)(bufferNom/bufferDenom);
}

Of course using this method you will run out of range very fast, because a long does not actually support very long numbers, so n and k have to be smaller than 20.

Supposing that you actually work with very large numbers you could use doubles instead of longs, as the powers become more and more significant.

Edit: If you use large numbers you could also use Stirling's Formula:

As n becomes large ln(n!) -> n*ln(n) - n.

Putting this into code:

public static double GetnCk(long n, long k)
{
    double buffern = n*Math.Log(n) - n;
    double bufferk = k*Math.Log(k) - k;
    double bufferkn = Math.Abs(n-k)*Math.Log(Math.Abs(n-k)) - Math.Abs(n-k);

    return Math.Exp(buffern)/(Math.Exp(bufferk)*Math.Exp(bufferkn));
}

I only propose this answer, as you said language independent, the C# code is just used to demonstrate it. Since you need to use large numbers for n and k for this to work, i propose this as a general way for finding the binomial coefficient for large combinations.

For cases were n and k are both smaller than around 200-300, you should use the answer Victor Mukherjee proposed, as it is exact.

Edit2: Edited my first code.

Luckey answered 19/10, 2012 at 23:49 Comment(4)
I tried out Victor's answer to about 20, 000 iterations and it ran perfectly. However, I did run out of memory at that range. If I need a greater range, I'll probably use this. Thank you for your answer.Cann
@Marv Why would you run out of memory? It's not recursive and there are no datastructures involved.Housecoat
@Housecoat You're right. I create several data structures on every iteration. I guess the choice of algorithm won't change a thing, except maybe the time it takes.Cann
In 5th line of second snippet in Math.Log(Math.Abs(n-k)) there is a problem, when n and k are equal the -Infinity (result of Log(0)) propagates to NaN.Astroid
F
1

Just for the sake of completion: the standard C math library has implementations of both Γ and lnΓ (called tgamma and lgamma), where

Γ(n) = (n-1)!

The library computation is certainly faster and more accurate than summing logarithms. For a lot more information, see Wikipedia and Mathworld.

Flaviaflavian answered 20/10, 2012 at 2:58 Comment(0)
C
1

Using some functions from the responses in this question, I created a list of function to do the calculation and I ran them each at different values 20000 times to see how well they worked. You should be able to combine this with some caching of factorials to use far larger values. Heres what i said in the other post located- https://mcmap.net/q/363310/-best-way-of-calculating-n-choose-k

After hitting a similar problem, I decided to compile the best solutions if have seen and run a simple test on each for a couple different values of n and k. I started with 10 or so functions and weeded out the ones that were just plain wrong or stopped working at specific values. Of all of the solutions, the answer above by user448810 is the cleanest and easiest to implement, I quite like it. Below will be code including each test i ran, the number of times i ean each function for each test, each functions code, the functions output and the time it took to get that output. I only did 20000 runs, there were still fluctuations in the time if i re ran the tests, but you should get a general sense of how well each worked.

//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    //EXPECTED VALUES
    //x choose x = 1
    //9 choose 4 =126
    //52 choose 5 = 2598960;
    //200 choose 5 = 2535650040;
    //64 choose 33 = 1.777090076065542336E18;

    //# of runs for each test: 20000
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
//https://mcmap.net/q/373059/-algorithm-for-calculating-binomial-coefficient
public static double combination(int n, int k) {
    double sum=0;
    for(long i=0;i<k;i++) {
        sum+=Math.Log10(n-i);
        sum-=Math.Log10(i+1);
    }
    return Math.Pow(10, sum);
}
/*
    4 choose 4
    1
    Elapsed Time is 94349 ticks
    9 choose 4
    126
    Elapsed Time is 94534 ticks
    52 choose 5
    2598959.99999999
    Elapsed Time is 112340 ticks
    200 choose 5
    2535650040
    Elapsed Time is 113253 ticks
    64 choose 33
    1.77709007606551E+18
    Elapsed Time is 717255 ticks
*/


//........................................................
//https://mcmap.net/q/373059/-algorithm-for-calculating-binomial-coefficient
public static double BinomCoefficient(int n, int k) {
    if (k > n) return 0;
    if (n == k) return 1; // only one way to chose when n == k
    if (k > n-k) k = n-k; // Everything is symmetric around n-k, so it is quicker to iterate over a smaller k than a larger one.
    double c = 1;
    for (long i = 1; i <= k; i++) {
        c *= n--;
        c /= i;
    }
    return c;
}
/*
    4 choose 4
    1
    Elapsed Time is 6181 ticks
    9 choose 4
    126
    Elapsed Time is 16989 ticks
    52 choose 5
    2598960
    Elapsed Time is 20636 ticks
    200 choose 5
    2535650040
    Elapsed Time is 20659 ticks
    64 choose 33
    1.77709007606554E+18
    Elapsed Time is 131757 ticks
*/

//........................................................
//https://mcmap.net/q/363310/-best-way-of-calculating-n-choose-k
public static double choose(int n, int k) {
    if (k == 0) return 1;
    return (n * choose(n-1, k-1)) / k;
}
/*
    4 choose 4
    1
    Elapsed Time is 24789 ticks
    9 choose 4
    126
    Elapsed Time is 14157 ticks
    52 choose 5
    2598960
    Elapsed Time is 17671 ticks
    200 choose 5
    2535650040
    Elapsed Time is 17827 ticks
    64 choose 33
    1.77709007606554E+18
    Elapsed Time is 134960 ticks
*/

//........................................................
//My own version which is just a mix of the two best above.
public static double binomialCoeff(int n, int k) {
    if (k > n) return 0;
    if (k > n-k) k = n-k; // Everything is symmetric around n-k, so it is quicker to iterate over a smaller k than a larger one.
    double recusion(long n, long k) {
        if (k == 0) return 1; // only one way to chose when n == k
        return (n * recusion(n-1, k-1)) / k;
    }
    return recusion(n,k);
}
/*
    4 choose 4
    1
    Elapsed Time is 6968 ticks
    9 choose 4
    126
    Elapsed Time is 19756 ticks
    52 choose 5
    2598960
    Elapsed Time is 25528 ticks
    200 choose 5
    2535650040
    Elapsed Time is 26350 ticks
    64 choose 33
    1.77709007606554E+18
    Elapsed Time is 156224 ticks
*/

//........................................................
//https://en.wikipedia.org/wiki/Binomial_coefficient
public static double binomial(int n, int k) {
    // take advantage of symmetry
    if (k > n-k)  k = n-k;

    long c = 1;
    for (long i = 1; i <= k; i++, n--) {
        // return 0 on potential overflow
        if (c/i >= long.MaxValue/n) return 0;
        // split c * n / i    into (c / i * i) + (c % i * n / i)
        c = (c / i * n) + (c % i * n / i); 
    }

    return c;
}

/*
    4 choose 4
    1
    Elapsed Time is 7661 ticks
    9 choose 4
    126
    Elapsed Time is 46471 ticks
    52 choose 5
    2598960
    Elapsed Time is 59829 ticks
    200 choose 5
    2535650040
    Elapsed Time is 62840 ticks
    64 choose 33
    1.77709007606554E+18
    Elapsed Time is 458198 ticks
*/
    
    
    
    
    
    
    
    
    
    
public static double BinomialCoefficient(int n, int k) {
    if (k > n) return 0;
    if (k > n-k) k = n-k; // Everything is symmetric around n-k, so use the smaller k.
    double recusion(long n, long k) => ((k==0)? 1 : ((n * recusion(n-1, k-1)) / k));
    return recusion(n,k);
}
/*
    4 choose 4
    1
    Elapsed Time is 5860 ticks
    9 choose 4
    126
    Elapsed Time is 22123 ticks
    52 choose 5
    2598960
    Elapsed Time is 25825 ticks
    200 choose 5
    2535650040
    Elapsed Time is 27490 ticks
    64 choose 33
    1.77709007606554E+18
    Elapsed Time is 161723 ticks
*/


//This is by far the fastest way to find the BC based on multiple tests. 
public static double BinomialCoefficient2(int n, int k) {
    if (k > n) return 0;
    if (k > n-k) k = n-k; // Everything is symmetric around n-k, so use the smaller k.
    
    double val= 1;
    for (int rk=1, rn=n-k+1; rk<=k; ++rk,++rn) {
        val = (rn)*val /(rk);
    }
    return val;
}
    
/*
    4 choose 4
    1
    Elapsed Time is 5824 ticks
    9 choose 4
    126
    Elapsed Time is 7983 ticks
    52 choose 5
    2598960
    Elapsed Time is 9820 ticks
    200 choose 5
    2535650040
    Elapsed Time is 10157 ticks
    64 choose 33
    1.77709007606554E+18
    Elapsed Time is 73985 ticks
*/
    
    
Colossal answered 7/4, 2022 at 22:48 Comment(0)
I
0

an ULTRA lazy way to get pascal triangle for small numbers is just to directly take 10……1^n then pick out the kth number (outside edge is 0-index) :

say 7 choose 3

10000001^7

10000007000002100000350000035000002100000070000001

1......7.....21.....[35].....35.....21......7......1

so the 3rd number is 35

but u don't need an number that big either

as long as the digits don't interfere with each other you can use a smaller one :

101^7 10721[35]35210701
            / 
 1 07 21 [35] 35 21 07 01

for a larger example like 52 choose 5 :

100000000000000005200000000000000132600000000000002210
000000000000027072500000000000259896000000000002035852
000000000013378456000000000075253815000000000367907540
000000001582002422000000006040372884000000020637940687
000000063501355960000000176896634460000000448138140632
000001036319450211500002194558835742000004267197736165
000007636038054190000012599462789413500019199181393392
000027053391963416000035287032995760000042638498203210
000047755117987595200049591853294810400047755117987595
200042638498203210000035287032995760000027053391963416
000019199181393392000012599462789413500007636038054190
000004267197736165000002194558835742000001036319450211
500000448138140632000000176896634460000000063501355960
000000020637940687000000006040372884000000001582002422
000000000367907540000000000075253815000000000013378456
000000000002035852000000000000259896000000000000027072
500000000000002210000000000000000132600000000000000005
2000000000000000001

1................52..............1326.............221.
.............27.725...........259896...........2.35852
..........13378456..........75253815.........3679.754.
........1582..2422........6.4.372884.......2.63794.687
.......635.135596.......17689663446.......44813814.632
.....1.3631945.2115....2194558835742.....4267197736165
.....7636.38.5419.....125994627894135...19199181393392
....27.53391963416....35287.3299576.....426384982.321.
....477551179875952...4959185329481.4...47755117987595
2...426384982.321.....35287.3299576.....27.53391963416
....19199181393392....125994627894135....7636.38.5419.
.....4267197736165.....2194558835742.....1.3631945.211
5.....44813814.632......17689663446........635.135596.
.......2.63794.687........6.4.372884........1582..2422
.........3679.754...........75253815..........13378456
...........2.35852...........[2598960]...........27.72
5.............221................1326................52.................1

1000052001326022100270727598980358653785312541829091220 0846239352200418853285708259917695364477460294737221565 3671988621946819250491634305958377579496249971026128690 2700491711626099928624254558213899307248720618983498175 2869796137659794672737446600278990761525382837845803585 22598960270725022100001326000052000001

Ivatts answered 1/11, 2023 at 16:34 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.