Statistics: combinations in Python
Asked Answered
U

22

142

I need to compute combinatorials (nCr) in Python but cannot find the function to do that in math, numpy or stat libraries. Something like a function of the type:

comb = calculate_combinations(n, r)

I need the number of possible combinations, not the actual combinations, so itertools.combinations does not interest me.

Finally, I want to avoid using factorials, as the numbers I'll be calculating the combinations for can get too big and the factorials are going to be monstrous.

This seems like a REALLY easy to answer question, however I am being drowned in questions about generating all the actual combinations, which is not what I want.

Unicef answered 11/6, 2010 at 18:13 Comment(0)
C
141

Updated answer in 2023: Use the math.comb function, which exists since Python 3.8 and has gotten much faster in 3.11.


Old answer: See scipy.special.comb (scipy.misc.comb in older versions of scipy). When exact is False, it uses the gammaln function to obtain good precision without taking much time. In the exact case it returns an arbitrary-precision integer, which might take a long time to compute.

Cogency answered 11/6, 2010 at 18:29 Comment(1)
scipy.misc.comb is deprecated in favor of scipy.special.comb since version 0.10.0.Esquiline
E
132

Why not write it yourself?
It's a one-liner or such:

from operator import mul    # or mul=lambda x,y:x*y
from fractions import Fraction

def nCk(n,k): 
  return int(reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1))

Test - printing Pascal's triangle:

>>> for n in range(17):
...     print ' '.join('%5d'%nCk(n,k) for k in range(n+1)).center(100)
...     
                                                   1                                                
                                                1     1                                             
                                             1     2     1                                          
                                          1     3     3     1                                       
                                       1     4     6     4     1                                    
                                    1     5    10    10     5     1                                 
                                 1     6    15    20    15     6     1                              
                              1     7    21    35    35    21     7     1                           
                           1     8    28    56    70    56    28     8     1                        
                        1     9    36    84   126   126    84    36     9     1                     
                     1    10    45   120   210   252   210   120    45    10     1                  
                  1    11    55   165   330   462   462   330   165    55    11     1               
               1    12    66   220   495   792   924   792   495   220    66    12     1            
            1    13    78   286   715  1287  1716  1716  1287   715   286    78    13     1         
         1    14    91   364  1001  2002  3003  3432  3003  2002  1001   364    91    14     1      
      1    15   105   455  1365  3003  5005  6435  6435  5005  3003  1365   455   105    15     1   
    1    16   120   560  1820  4368  8008 11440 12870 11440  8008  4368  1820   560   120    16     1
>>> 

PS.
Edited to replace:
int(round(reduce(mul, (float(n-i)/(i+1) for i in range(k)), 1)))
with:
int(reduce(mul, (Fraction(n-i, i+1) for i in range(k)), 1))
so it won't err for big n/k

Expansile answered 12/6, 2010 at 1:25 Comment(11)
+1 for suggesting to write something simple, for using reduce, and for the cool demo with pascal triangleLuella
bonus points if you substitute range( ..) with xrange( ... ) in python 2.x ;-)Sanity
I think the poster was looking for a more efficient solution tailored to large inputs (so the use of a for loop, esp. with range, would't work)Neisse
-1 because this answer is wrong: print factorial(54)/(factorial(54 - 27))/factorial(27) == nCk(54, 27) gives False.Hightest
@robertking - Ok, you were both petty and technically correct. What i did was meant as illustration of how to write one's own function; i knew it is not accurate for big enough N and K due to floating point precision. But we can fix that - see above, now it should not err for big numbersExpansile
@NasBanov I changed my -1 to a +1 (Although I think the Fraction may be overkill lol). I just think it's good for people to know the limits of functions they are using. This answer ranks high on google search so a lot of new comers may use this code.Hightest
This would probably be fast in Haskell, but not Python unfortunately. It's actually quite slow compared to many of the other answers, e.g. @Alex Martelli, J.F. Sebastian, and my own.Engird
For Python 3, I had to also from functools import reduce.Isolative
I don't think it's a good idea to write your own versions of library functions except for fun :)Calefaction
Some optimizations can be done. k = min(k, n - k), as nCk = nC(n-k), and instead of doing a generator expression dividing every time, doing all the things you need to multiply / all the things you need to divide. So: def nCk(n, k): k = min(k, n - k); dividend = reduce(mul, xrange(n - k + 1, n + 1), 1); divisor = reduce(mul, xrange(1, r + 1), 1); return dividend // divisorConformist
@Conformist it looks like the formula from the first code example in my answer.Cytoplasm
C
55

A quick search on google code gives (it uses formula from @Mark Byers's answer):

def choose(n, k):
    """
    A fast way to calculate binomial coefficients by Andrew Dalke (contrib).
    """
    if 0 <= k <= n:
        ntok = 1
        ktok = 1
        for t in xrange(1, min(k, n - k) + 1):
            ntok *= n
            ktok *= t
            n -= 1
        return ntok // ktok
    else:
        return 0

choose() is 10 times faster (tested on all 0 <= (n,k) < 1e3 pairs) than scipy.misc.comb() if you need an exact answer.

def comb(N,k): # from scipy.comb(), but MODIFIED!
    if (k > N) or (N < 0) or (k < 0):
        return 0L
    N,k = map(long,(N,k))
    top = N
    val = 1L
    while (top > (N-k)):
        val *= top
        top -= 1
    n = 1L
    while (n < k+1L):
        val /= n
        n += 1
    return val
Cytoplasm answered 11/6, 2010 at 19:12 Comment(3)
A nice solution that doesn't require any pkgGastrointestinal
FYI: The formula mentioned is here: en.wikipedia.org/wiki/…Oswell
This choose function should have way more up-votes! Python 3.8 has math.comb, but I had to use Python 3.6 for a challenge and none implementations gave exact results for very large integers. This one does and does it fast!Roesler
M
45

If you want exact results and speed, try gmpy -- gmpy.comb should do exactly what you ask for, and it's pretty fast (of course, as gmpy's original author, I am biased;-).

Mindexpanding answered 11/6, 2010 at 21:15 Comment(5)
Indeed, gmpy2.comb() is 10 times faster than choose() from my answer for the code: for k, n in itertools.combinations(range(1000), 2): f(n,k) where f() is either gmpy2.comb() or choose() on Python 3.Cytoplasm
Since you're the author of the package, I'll let you fix the broken link so it points to the right place....Moonshot
@SeldomNeedy, the link to code.google.com is one right place (though the site is in archival mode now). Of course from there it's easy to find the github location, github.com/aleaxit/gmpy , and the PyPI one, pypi.python.org/pypi/gmpy2 , as it links to both!-)Mindexpanding
@AlexMartelli Sorry for the confusion. The page displays a 404 if javascript has been (selectively) disabled. I guess that's to discourage rogue AIs from incorporating archived Google Code Project sources quite so easily?Moonshot
props to you, it's the fastest out of the 17 different algorithms I tested in my answer. too bad it doesn't support fractions/decimals.Sacral
M
30

If you want an exact result, use sympy.binomial. It seems to be the fastest method, hands down.

x = 1000000
y = 234050

%timeit scipy.misc.comb(x, y, exact=True)
1 loops, best of 3: 1min 27s per loop

%timeit gmpy.comb(x, y)
1 loops, best of 3: 1.97 s per loop

%timeit int(sympy.binomial(x, y))
100000 loops, best of 3: 5.06 µs per loop
Motoring answered 23/11, 2013 at 11:0 Comment(1)
sympy has a cache which timeit is not clearing. In my testing, gmpy is ~264x faster.Sacral
E
28

A literal translation of the mathematical definition is quite adequate in a lot of cases (remembering that Python will automatically use big number arithmetic):

from math import factorial

def calculate_combinations(n, r):
    return factorial(n) // factorial(r) // factorial(n-r)

For some inputs I tested (e.g. n=1000 r=500) this was more than 10 times faster than the one liner reduce suggested in another (currently highest voted) answer. On the other hand, it is out-performed by the snippit provided by @J.F. Sebastian.

Engird answered 1/10, 2013 at 6:54 Comment(0)
B
28

Starting Python 3.8, the standard library now includes the math.comb function to compute the binomial coefficient:

math.comb(n, k)

which is the number of ways to choose k items from n items without repetition
n! / (k! (n - k)!):

import math
math.comb(10, 5) # 252
Broadcloth answered 1/6, 2019 at 10:27 Comment(0)
F
10

Here's another alternative. This one was originally written in C++, so it can be backported to C++ for a finite-precision integer (e.g. __int64). The advantage is (1) it involves only integer operations, and (2) it avoids bloating the integer value by doing successive pairs of multiplication and division. I've tested the result with Nas Banov's Pascal triangle, it gets the correct answer:

def choose(n,r):
  """Computes n! / (r! (n-r)!) exactly. Returns a python long int."""
  assert n >= 0
  assert 0 <= r <= n

  c = 1L
  denom = 1
  for (num,denom) in zip(xrange(n,n-r,-1), xrange(1,r+1,1)):
    c = (c * num) // denom
  return c

Rationale: To minimize the # of multiplications and divisions, we rewrite the expression as

    n!      n(n-1)...(n-r+1)
--------- = ----------------
 r!(n-r)!          r!

To avoid multiplication overflow as much as possible, we will evaluate in the following STRICT order, from left to right:

n / 1 * (n-1) / 2 * (n-2) / 3 * ... * (n-r+1) / r

We can show that integer arithmatic operated in this order is exact (i.e. no roundoff error).

Fideism answered 24/8, 2012 at 19:29 Comment(0)
D
6

You can write 2 simple functions that actually turns out to be about 5-8 times faster than using scipy.special.comb. In fact, you don't need to import any extra packages, and the function is quite easily readable. The trick is to use memoization to store previously computed values, and using the definition of nCr

# create a memoization dictionary
memo = {}
def factorial(n):
    """
    Calculate the factorial of an input using memoization
    :param n: int
    :rtype value: int
    """
    if n in [1,0]:
        return 1
    if n in memo:
        return memo[n]
    value = n*factorial(n-1)
    memo[n] = value
    return value

def ncr(n, k):
    """
    Choose k elements from a set of n elements - n must be larger than or equal to k
    :param n: int
    :param k: int
    :rtype: int
    """
    return factorial(n)/(factorial(k)*factorial(n-k))

If we compare times

from scipy.special import comb
%timeit comb(100,48)
>>> 100000 loops, best of 3: 6.78 µs per loop

%timeit ncr(100,48)
>>> 1000000 loops, best of 3: 1.39 µs per loop
Donatus answered 25/10, 2018 at 10:40 Comment(1)
These days there's a memoize decorator in functools called lru_cache which might simplify your code?Drucie
L
5

Using dynamic programming, the time complexity is Θ(n*m) and space complexity Θ(m):

def binomial(n, k):
""" (int, int) -> int

         | c(n-1, k-1) + c(n-1, k), if 0 < k < n
c(n,k) = | 1                      , if n = k
         | 1                      , if k = 0

Precondition: n > k

>>> binomial(9, 2)
36
"""

c = [0] * (n + 1)
c[0] = 1
for i in range(1, n + 1):
    c[i] = 1
    j = i - 1
    while j > 0:
        c[j] += c[j - 1]
        j -= 1

return c[k]
Lozier answered 12/9, 2014 at 17:30 Comment(0)
E
5

If your program has an upper bound to n (say n <= N) and needs to repeatedly compute nCr (preferably for >>N times), using lru_cache can give you a huge performance boost:

from functools import lru_cache

@lru_cache(maxsize=None)
def nCr(n, r):
    return 1 if r == 0 or r == n else nCr(n - 1, r - 1) + nCr(n - 1, r)

Constructing the cache (which is done implicitly) takes up to O(N^2) time. Any subsequent calls to nCr will return in O(1).

Exit answered 30/3, 2017 at 3:5 Comment(0)
S
4

It's pretty easy with sympy.

import sympy

comb = sympy.binomial(n, r)
Sensitive answered 17/7, 2016 at 18:21 Comment(1)
the nice thing about this one is that it's the only python binomial function I can find that supports n/r being floats AND n being negative. Another answer said it's fast but I'd bet it's doing some form of caching.Sacral
P
3

The direct formula produces big integers when n is bigger than 20.

So, yet another response:

from math import factorial

reduce(long.__mul__, range(n-r+1, n+1), 1L) // factorial(r)

short, accurate and efficient because this avoids python big integers by sticking with longs.

It is more accurate and faster when comparing to scipy.special.comb:

 >>> from scipy.special import comb
 >>> nCr = lambda n,r: reduce(long.__mul__, range(n-r+1, n+1), 1L) // factorial(r)
 >>> comb(128,20)
 1.1965669823265365e+23
 >>> nCr(128,20)
 119656698232656998274400L  # accurate, no loss
 >>> from timeit import timeit
 >>> timeit(lambda: comb(n,r))
 8.231969118118286
 >>> timeit(lambda: nCr(128, 20))
 3.885951042175293
Photomechanical answered 4/12, 2015 at 11:46 Comment(2)
This is wrong! If n == r, result should be 1. This code returns 0.Sputum
More precisely, it should be range(n-r+1, n+1) instead of range(n-r,n+1).Sputum
H
3

Using only standard library distributed with Python:

import itertools

def nCk(n, k):
    return len(list(itertools.combinations(range(n), k)))
Heliacal answered 10/11, 2016 at 22:19 Comment(1)
i don't think its time complexity (and memory usage) is acceptable.Thwack
E
3

This function is very optimized.

def nCk(n,k):
    m=0
    if k==0:
        m=1
    if k==1:
        m=n
    if k>=2:
        num,dem,op1,op2=1,1,k,n
        while(op1>=1):
            num*=op2
            dem*=op1
            op1-=1
            op2-=1
        m=num//dem
    return m
Edger answered 13/5, 2019 at 21:43 Comment(0)
P
2

That's probably as fast as you can do it in pure python for reasonably large inputs:

def choose(n, k):
    if k == n: return 1
    if k > n: return 0
    d, q = max(k, n-k), min(k, n-k)
    num =  1
    for n in xrange(d+1, n+1): num *= n
    denom = 1
    for d in xrange(1, q+1): denom *= d
    return num / denom
Pocahontas answered 24/5, 2015 at 20:42 Comment(0)
Z
2

Here is an efficient algorithm for you

for i = 1.....r

   p = p * ( n - i ) / i

print(p)

For example nCr(30,7) = fact(30) / ( fact(7) * fact(23)) = ( 30 * 29 * 28 * 27 * 26 * 25 * 24 ) / (1 * 2 * 3 * 4 * 5 * 6 * 7)

So just run the loop from 1 to r can get the result.


In python:

n,r=5,2
p=n
for i in range(1,r):
   p = p*(n - i)/i
else:
   p = p/(i+1)
print(p)
Zwickau answered 5/10, 2019 at 10:49 Comment(0)
S
1

I timed 17 different functions from this thread and libraries linked here.

Since I feel it's a bit much to dump here, I put the code for the functions in a pastebin here.

The first test I did was to build pascal's triangle to the 100th row. I used timeit to do this 100 times. The numbers below are the average time it took in seconds to build the triangle once.

gmpy2.gmpy2.comb 0.0012259269999998423
math.comb 0.007063110999999935
__main__.stdfactorial2 0.011469491
__main__.scipybinom 0.0120114319999999
__main__.stdfactorial 0.012105122
__main__.scipycombexact 0.012569045999999844
__main__.andrewdalke 0.01825201100000015
__main__.rabih 0.018472497000000202
__main__.kta 0.019374668000000383
__main__.wirawan 0.029312811000000067
scipy.special._basic.comb 0.03221609299999954
__main__.jfsmodifiedscipy 0.04332894699999997
__main__.rojas 0.04395155400000021
sympy.functions.combinatorial.factorials.binomial 0.3233529779999998
__main__.nasbanov 0.593365528
__main__.pantelis300 1.7780402499999999

You may notice that there are only 16 functions here. That's because the recursive() function couldn't complete this even once in a reasonable amount of time, so I had to exclude it from the timeit tests. seriously, it's been going for hours.

I also timed various other types of inputs that not all of the above functions supported. Keep in mind that I only ran the test for each 10 times because nCr is computationally expensive and I'm impatient

Fractional values for n

__main__.scipybinom 0.011481370000000001
__main__.kta 0.01869513999999999
sympy.functions.combinatorial.factorials.binomial 6.33897291

Fractional values for r

__main__.scipybinom 0.010960040000000504
scipy.special._basic.comb 0.03681254999999908
sympy.functions.combinatorial.factorials.binomial 3.2962564499999987

Fractional values for n and r

__main__.scipybinom 0.008623409999998444
sympy.functions.combinatorial.factorials.binomial 3.690936439999999

Negative values for n

gmpy2.gmpy2.comb 0.010770989999997482
__main__.kta 0.02187850000000253
__main__.rojas 0.05104292999999984
__main__.nasbanov 0.6153183200000001
sympy.functions.combinatorial.factorials.binomial 3.0460310799999943

Negative fractional values for n, fractional values for r

sympy.functions.combinatorial.factorials.binomial 3.7689941699999965

the best solution currently for maximum speed and versatility would be a hybrid function to choose between different algorithms depending on the inputs

def hybrid(n: typing.Union[int, float], k: typing.Union[int, float]) -> typing.Union[int, float]:
    # my own custom hybrid solution
    def is_integer(n):
        return isinstance(n, int) or n.is_integer()
    if k < 0:
        raise ValueError("k cannot be negative.")
    elif n == 0:
        return 0
    elif k == 0 or k == n:
        return 1
    elif is_integer(n) and is_integer(k):
        return int(gmpy2.comb(int(n), int(k)))
    elif n > 0:
        return scipy.special.binom(n, k)
    else:
        return float(sympy.binomial(n, k))

Since sympy.binomial() is so slow, the true ideal solution would be to combine the code of scipy.special.binom() which performs well for fractions and gmpy2.comb() which performs well for ints. scipy's func and gympy2's func are both written in C which I am not very familiar with.

Sacral answered 14/12, 2021 at 4:49 Comment(0)
S
1

If you don't want to import any libraries then you could solve it by doing a recursive function like this:

def comb(n, k):
    if (k < 0 or n < k or n == 0):
        return 0

    if (k == 0 or k == n):
        return 1
    else:
        return comb(n - 1, k - 1) + comb(n - 1, k)
Stabile answered 25/10, 2023 at 16:8 Comment(1)
This will work in decent time only with small numbers. In the original question I highlight that I'll be calculating this for big numbers.Unicef
D
0

This is @killerT2333 code using the builtin memoization decorator.

from functools import lru_cache

@lru_cache()
def factorial(n):
    """
    Calculate the factorial of an input using memoization
    :param n: int
    :rtype value: int
    """
    return 1 if n in (1, 0) else n * factorial(n-1)

@lru_cache()
def ncr(n, k):
    """
    Choose k elements from a set of n elements,
    n must be greater than or equal to k.
    :param n: int
    :param k: int
    :rtype: int
    """
    return factorial(n) // (factorial(k) * factorial(n - k))

print(ncr(6, 3))
Drucie answered 31/12, 2018 at 8:46 Comment(0)
P
0

Very simple. Just import comb function from math module and get the result!!

Complete code is below:

from math import comb
n, r = 7, 3
print(comb(n,r))
Polio answered 29/8, 2023 at 16:52 Comment(0)
I
-1

2024 Answer. Efficiency is pretty important for those can become rather large numbers. For instance (400 choose 22) = 869220235645633221187116908079236800.

The currently best way is to use math modules comb function:

>>> import math
>>> n = 400
>>> k = 22
>>> math.comb(n,k)
869220235645633221187116908079236800

The reason to use this over the "naive" approach:

>>> math.factorial(n)/math.factorial(k)*math.factorial(n-k)
# ...
OverflowError: integer division result too large for a float

is that math.comb uses the same factorial-reduction "tricks" that you'd use if calculating manually.

The only limit is the sky or rather the limit given for integer string conversion. This limit can be changed, and then you can calculate truly enormous values, such as math.comb(1000000,500000) in a reasonable time, which is an integer with 301027 digits.

Inexpert answered 3/4 at 13:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.