Calculating the Amount of Combinations
Asked Answered
U

11

31

Cheers,

I know you can get the amount of combinations with the following formula (without repetition and order is not important):

// Choose r from n

n! / r!(n - r)!

However, I don't know how to implement this in C++, since for instance with

n = 52

n! = 8,0658175170943878571660636856404e+67

the number gets way too big even for unsigned __int64 (or unsigned long long). Is there some workaround to implement the formula without any third-party "bigint" -libraries?

Urgency answered 3/12, 2009 at 8:3 Comment(1)
F
43

Here's an ancient algorithm which is exact and doesn't overflow unless the result is to big for a long long

unsigned long long
choose(unsigned long long n, unsigned long long k) {
    if (k > n) {
        return 0;
    }
    unsigned long long r = 1;
    for (unsigned long long d = 1; d <= k; ++d) {
        r *= n--;
        r /= d;
    }
    return r;
}

This algorithm is also in Knuth's "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms" I think.

UPDATE: There's a small possibility that the algorithm will overflow on the line:

r *= n--;

for very large n. A naive upper bound is sqrt(std::numeric_limits<long long>::max()) which means an n less than rougly 4,000,000,000.

Forgo answered 3/12, 2009 at 9:25 Comment(5)
Could this be improved by r *= (n--) / d, to do the divide first?Mazarin
GManNickG, it seems to me that we would lose precision that way.Rives
One improvement is to set k to the minimum of k and (n - k).Firebox
As shown here by Howard, this answer is imprecise, especially starting from for very large n. A naive upper bound...Hausner
so the "update" is ... shell we say, grossly incorrect? (the next voted answer shows overflow occurring for n == 67 ).Bree
I
35

From Andreas' answer:

Here's an ancient algorithm which is exact and doesn't overflow unless the result is to big for a long long

unsigned long long
choose(unsigned long long n, unsigned long long k) {
    if (k > n) {
        return 0;
    }
    unsigned long long r = 1;
    for (unsigned long long d = 1; d <= k; ++d) {
        r *= n--;
        r /= d;
    }
    return r;
}

This algorithm is also in Knuth's "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms" I think.

UPDATE: There's a small possibility that the algorithm will overflow on the line:

r *= n--;

for very large n. A naive upper bound is sqrt(std::numeric_limits<long long>::max()) which means an n less than rougly 4,000,000,000.

Consider n == 67 and k == 33. The above algorithm overflows with a 64 bit unsigned long long. And yet the correct answer is representable in 64 bits: 14,226,520,737,620,288,370. And the above algorithm is silent about its overflow, choose(67, 33) returns:

8,829,174,638,479,413

A believable but incorrect answer.

However the above algorithm can be slightly modified to never overflow as long as the final answer is representable.

The trick is in recognizing that at each iteration, the division r/d is exact. Temporarily rewriting:

r = r * n / d;
--n;

For this to be exact, it means if you expanded r, n and d into their prime factorizations, then one could easily cancel out d, and be left with a modified value for n, call it t, and then the computation of r is simply:

// compute t from r, n and d
r = r * t;
--n;

A fast and easy way to do this is to find the greatest common divisor of r and d, call it g:

unsigned long long g = gcd(r, d);
// now one can divide both r and d by g without truncation
r /= g;
unsigned long long d_temp = d / g;
--n;

Now we can do the same thing with d_temp and n (find the greatest common divisor). However since we know a-priori that r * n / d is exact, then we also know that gcd(d_temp, n) == d_temp, and therefore we don't need to compute it. So we can divide n by d_temp:

unsigned long long g = gcd(r, d);
// now one can divide both r and d by g without truncation
r /= g;
unsigned long long d_temp = d / g;
// now one can divide n by d/g without truncation
unsigned long long t = n / d_temp;
r = r * t;
--n;

Cleaning up:

unsigned long long
gcd(unsigned long long x, unsigned long long y)
{
    while (y != 0)
    {
        unsigned long long t = x % y;
        x = y;
        y = t;
    }
    return x;
}

unsigned long long
choose(unsigned long long n, unsigned long long k)
{
    if (k > n)
        throw std::invalid_argument("invalid argument in choose");
    unsigned long long r = 1;
    for (unsigned long long d = 1; d <= k; ++d, --n)
    {
        unsigned long long g = gcd(r, d);
        r /= g;
        unsigned long long t = n / (d / g);
        if (r > std::numeric_limits<unsigned long long>::max() / t)
           throw std::overflow_error("overflow in choose");
        r *= t;
    }
    return r;
}

Now you can compute choose(67, 33) without overflow. And if you try choose(68, 33), you'll get an exception instead of a wrong answer.

Insatiate answered 15/1, 2011 at 17:43 Comment(5)
Howard, I've fixed the messed-up formatting in your answer. Please read the edit hints to the right of the edit pane as to how to do this yourself. Oh, and very welcome to SO!Disuse
@Disuse he wanted to quote the accepted answer, which is why it looked a bit odd. In all fairness, the editor really sucks for some things imo.Cent
@Johannes: Oh, I totally missed that! maybe a hint would be appropriate?Disuse
Your edits are right on the mark, thanks much! I'm a newbie here and am still learning proper etiquette and editing.Insatiate
Same optimisation as proposed for original answer could applied here as well: setting k to minimum of k and n-k...Skiest
A
6

The following routine will compute the n-choose-k, using the recursive definition and memoization. The routine is extremely fast and accurate:

inline unsigned long long n_choose_k(const unsigned long long& n,
                                     const unsigned long long& k)
{
   if (n  < k) return 0;
   if (0 == n) return 0;
   if (0 == k) return 1;
   if (n == k) return 1;
   if (1 == k) return n;       
   typedef unsigned long long value_type;
   value_type* table = new value_type[static_cast<std::size_t>(n * n)];
   std::fill_n(table,n * n,0);
   class n_choose_k_impl
   {
   public:

      n_choose_k_impl(value_type* table,const value_type& dimension)
      : table_(table),
        dimension_(dimension)
      {}

      inline value_type& lookup(const value_type& n, const value_type& k)
      {
         return table_[dimension_ * n + k];
      }

      inline value_type compute(const value_type& n, const value_type& k)
      {
         if ((0 == k) || (k == n))
            return 1;
         value_type v1 = lookup(n - 1,k - 1);
         if (0 == v1)
            v1 = lookup(n - 1,k - 1) = compute(n - 1,k - 1);
         value_type v2 = lookup(n - 1,k);
         if (0 == v2)
            v2 = lookup(n - 1,k) = compute(n - 1,k);
         return v1 + v2;
      }

      value_type* table_;
      value_type dimension_;
   };
   value_type result = n_choose_k_impl(table,n).compute(n,k);
   delete [] table;
   return result;
}
Adamite answered 23/1, 2011 at 20:23 Comment(0)
I
4

Remember that

n! / ( n - r )! = n * ( n - 1) * .. * (n - r + 1 )

so it's way smaller than n!. So the solution is to evaluate n* ( n - 1 ) * ... * ( n - r + 1) instead of first calculating n! and then dividing it .

Of course it all depends on the relative magnitude of n and r - if r is relatively big compared to n, then it still won't fit.

Insouciant answered 3/12, 2009 at 8:14 Comment(2)
Please note that the question is how to calculate n! / r!(n - r)! instead of n! / (n - r)!.Cadge
In your answer the division by r! seems to be missing, you seem to compute just n!/(n-r)!Hausner
U
2

Well, I have to answer to my own question. I was reading about Pascal's triangle and by accident noticed that we can calculate the amount of combinations with it:

#include <iostream>
#include <boost/cstdint.hpp>

boost::uint64_t Combinations(unsigned int n, unsigned int r)
{
    if (r > n)
        return 0;

    /** We can use Pascal's triange to determine the amount
      * of combinations. To calculate a single line:
      *
      * v(r) = (n - r) / r
      *
      * Since the triangle is symmetrical, we only need to calculate
      * until r -column.
      */

    boost::uint64_t v = n--;

    for (unsigned int i = 2; i < r + 1; ++i, --n)
        v = v * n / i;

    return v;
}

int main()
{
    std::cout << Combinations(52, 5) << std::endl;
}
Urgency answered 3/12, 2009 at 11:56 Comment(3)
Yup, this is exactly the same algorithm as I posted. Kudos for coming up with it yourself ;)Forgo
note: since C++11 , uint64_t is part of #include <cstdint> and so we no longer need to use boost for this exampleDiscontinuous
Pascal's triangle is two dimensional. So, the algorithm will have time complexity O(n^2), while the standard algorithm from Knuth's book (and its derivatives or minor tweaks) should take only O(n) time complexity.Silken
S
2

Improves Howard Hinnant's answer (in this question) a little bit: Calling gcd() per loop seems a bit slow. We could aggregate the gcd() call into the last one, while making the most use of the standard algorithm from Knuth's book "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms":

const uint64_t u64max = std::numeric_limits<uint64_t>::max();
uint64_t choose(uint64_t n, uint64_t k)
{
    if (k > n)
        throw std::invalid_argument(std::string("invalid argument in ") + __func__);

    if (k > n - k)
        k = n - k;

    uint64_t r = 1;
    uint64_t d;
    for (d = 1; d <= k; ++d) {
        if (r > u64max / n)
            break;
        r *= n--;
        r /= d;
    }

    if (d > k)
        return r;

    // Let N be the original n,
    // n is the current n (when we reach here)
    // We want to calculate C(N,k),
    // Currently we already calculated the r value so far:
    // r = C(N, n) = C(N, N-n) = C(N, d-1)
    // Note that N-n = d-1
    // In addition we know the following identity formula:
    //  C(N,k) = C(N,d-1) * C(N-d+1, k-d+1) / C(k, k-d+1)
    //         = C(N,d-1) * C(n, k-d+1) / C(k, k-d+1)
    // Using this formula, we effectively reduce the calculation,
    // while recursively use the same function.
    uint64_t b = choose(n, k-d+1);
    if (b == u64max) {
        return u64max;  // overflow
    }

    uint64_t c = choose(k, k-d+1);
    if (c == u64max) {
        return u64max;  // overflow
    }

    // Now, the combinatorial should be r * b / c
    // We can use gcd() to calculate this:
    // We Pick b for gcd: b < r almost (if not always) in all cases
    uint64_t g = gcd(b, c);
    b /= g;
    c /= g;
    r /= c;

    if (r > u64max / b)
        return u64max;   // overflow

    return r * b;
}

Note that the recursive depth is normally 2 (I don't really see a case goes to 3, the combinatorial reducing is quite decent.), i.e. calling choose() for 3 times, for non-overflow cases.

Replace uint64_t with unsigned long long if you prefer it.

Silken answered 13/8, 2021 at 7:30 Comment(1)
Perhaps an even faster alternative is to use big number multiplication/division for the last part calculating r x b / cSilken
T
1

Getting the prime factorization of the binomial coefficient is probably the most efficient way to calculate it, especially if multiplication is expensive. This is certainly true of the related problem of calculating factorial (see Click here for example).

Here is a simple algorithm based on the Sieve of Eratosthenes that calculates the prime factorization. The idea is basically to go through the primes as you find them using the sieve, but then also to calculate how many of their multiples fall in the ranges [1, k] and [n-k+1,n]. The Sieve is essentially an O(n \log \log n) algorithm, but there is no multiplication done. The actual number of multiplications necessary once the prime factorization is found is at worst O\left(\frac{n \log \log n}{\log n}\right) and there are probably faster ways than that.

prime_factors = []

n = 20
k = 10

composite = [True] * 2 + [False] * n

for p in xrange(n + 1):
if composite[p]:
    continue

q = p
m = 1
total_prime_power = 0
prime_power = [0] * (n + 1)

while True:

    prime_power[q] = prime_power[m] + 1
    r = q

    if q <= k:
        total_prime_power -= prime_power[q]

    if q > n - k:
        total_prime_power += prime_power[q]

    m += 1
    q += p

    if q > n:
        break

    composite[q] = True

prime_factors.append([p, total_prime_power])

 print prime_factors
Topsyturvy answered 6/3, 2015 at 16:43 Comment(0)
B
1

Using a dirty trick with a long double, it is possible to get the same accuracy as Howard Hinnant (and probably more):

unsigned long long n_choose_k(int n, int k)
{
    long double f = n;
    for (int i = 1; i<k+1; i++)
        f /= i;
    for (int i=1; i<k; i++)
        f *= n - i;

    unsigned long long f_2 = std::round(f);

    return f_2;
}

The idea is to divide first by k! and then to multiply by n(n-1)...(n-k+1). The approximation through the double can be avoided by inverting the order of the for loop.

Brian answered 30/5, 2018 at 10:44 Comment(0)
U
0

One of SHORTEST way :

int nChoosek(int n, int k){
    if (k > n) return 0;
    if (k == 0) return 1;
    return nChoosek(n - 1, k) + nChoosek(n - 1, k - 1);
}
Upturned answered 14/10, 2016 at 15:51 Comment(0)
R
0

A method similar to the Sieve of Eratosthenes. While the sieve of Eratosthenes is a multiple annihilation, this one is a multiple half-kill. Since n!/((n-r)!r!) is always an integer, first cancel the denominator and then multiply the rest. This algorithm works well even for non-big integers.

In the sequence of natural numbers, the k-th number can divide the (multiple of k)-th number. This can be done continuously with k=2,3,4,... Taking advantage of this fact, first cancel the denominator and then multiply the remainder. This ensures that if the answer does not overflow, it will not overflow in the course of the calculation.

Iriyama’s algorithm

public static BigInteger Combination(int n, int r)
{
    if (n < 0 || r < 0 || r > n) throw new ArgumentException("Invalid parameter");

    if (n - r < r) r = n - r;
    if (r == 0) return 1;
    if (r == 1) return n;

    int[] numerator = new int[r];
    int[] denominator = new int[r];

    for (int k = 0; k < r; k++)
    {
        numerator[k] = n - r + k + 1;
        denominator[k] = k + 1;
    }

    for (int p = 2; p <= r; p++)
    {
        int pivot = denominator[p - 1];
        if (pivot > 1)
        {
            int offset = (n - r) % p;
            for (int k = p - 1; k < r; k += p)
            {
                numerator[k - offset] /= pivot;
                denominator[k] /= pivot;
            }
        }
    }

    BigInteger result = BigInteger.One;
    for (int k = 0; k < r; k++)
    {
        if (numerator[k] > 1) result *= numerator[k];
    }
    return result;
}   
Resupine answered 4/4, 2023 at 14:45 Comment(2)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Paramagnet
Added CODE: Are you clear now?Resupine
J
-1

If you want to be 100% sure that no overflows occur so long as the final result is within the numeric limit, you can sum up Pascal's Triangle row-by-row:

for (int i=0; i<n; i++) {
    for (int j=0; j<=i; j++) {
        if (j == 0) current_row[j] = 1;
        else current_row[j] = prev_row[j] + prev_row[j-1];
    }
    prev_row = current_row; // assume they are vectors
}
// result is now in current_row[r-1]

However, this algorithm is much slower than the multiplication one. So perhaps you could use multiplication to generate all the cases you know that are 'safe' and then use addition from there. (.. or you could just use a BigInt library).

Junno answered 3/12, 2009 at 12:46 Comment(3)
As Andreas has stated in his answer, overflow could occur during the multiplication by n--. It wouldn't happen here.Junno
But as you've stated you'd have to wait for the end of the universe for the answer from this algorithm ;)Forgo
This doesn't work for r = 0. Need to modify to return 1.Majordomo

© 2022 - 2024 — McMap. All rights reserved.