generating list of every combination without duplicates
Asked Answered
L

5

8

I would like to generate a list of combinations. I will try to simplify my problem to make it understandable.

We have 3 variables :

  • x : number of letters
  • k : number of groups
  • n : number of letters per group

I would like to generate using python a list of every possible combinations, without any duplicate knowing that : i don't care about the order of the groups and the order of the letters within a group.

As an example, with x = 4, k = 2, n = 2 :

# we start with 4 letters, we want to make 2 groups of 2 letters
letters = ['A','B','C','D']

# here would be a code that generate the list

# Here is the result that is very simple, only 3 combinations exist.
combos = [ ['AB', 'CD'], ['AC', 'BD'], ['AD', 'BC'] ]

Since I don't care about the order of or within the groups, and letters within a group, ['AB', 'CD'] and ['DC', 'BA'] is a duplicate.

This is a simplification of my real problem, which has those values : x = 12, k = 4, n = 3. I tried to use some functions from itertools, but with that many letters my computer freezes because it's too many combinations.

Another way of seeing the problem : you have 12 players, you want to make 4 teams of 3 players. What are all the possibilities ?

Could anyone help me to find an optimized solution to generate this list?

Lombok answered 31/3, 2022 at 17:44 Comment(15)
Do you always have x = k * n, or is it x >= k * n?Koopman
This code will give you all the combinations: comb = [(a,b) for a in letters for b in letters if a != b]. I don't quite know how to remove the duplicatesQuidnunc
If itertools freezes your computer, I don't think you could implement something better using only PythonHolograph
Have you computed how many combinations you are talking about? Off the top of my head I think it is more than 1 billion. I may have misunderstood something or made an error, but if I'm anywhere close then the problem is hopeless. Even if you managed to compute the answer you would have no good way of looking at it.Snapp
@DavidDavó. That's clearly not trueKoopman
@PaulCornelius. You can easily make a generator for this...Koopman
Yes x = k*n. To make the problem more clear, i have 12 players, i want to make 4 teams of 3 players and would like to have all the possibilities. I couldnt compute all the combinations but it's less than 1 billion. If there were no duplicate, it would be fact(12) = 479.001.600 combos. But it will be way less because there are a lot of duplicates. For example with x = 4, fact(4) = 24 but whne you remove duplicates there are only 3 combosLombok
Itertools is implemented in cpython, and returns a generator. I doubt a python's one liner could perform betterHolograph
Isn’t k determined by x and n? Can you provide an example where it’s not?Pain
yes k is determined by x and n.Lombok
@DavidDavó itertools generate all the possibilities (400 millions for x = 12). According to some comments, after removing duplicates i would have 369.000 combos. The idea is not to perform better than itertools, but to find a solution that is more specific to my problem.Lombok
@JohnColeman I think that also counts all orderings of the partitions. I think you'll have to divide that by k! again for a total of 15,400 in this case.Tractable
@Tractable You are correctAchene
The function get_partitions_same_size from my answer here could be helpful.Berseem
I timed three solutions on my computer: @fsimonjetz ~60 seconds, my first ~1.7 seconds, my second ~0.36 seconds.Stat
W
2

There will certainly be more sophisticated/efficient ways of doing this, but here's an approach that works in a reasonable amount of time for your example and should be easy enough to adapt for other cases.

It generates unique teams and unique combinations thereof, as per your specifications.

from itertools import combinations

# this assumes that team_size * team_num == len(players) is a given
team_size = 3
team_num = 4
players = list('ABCDEFGHIJKL')
unique_teams = [set(c) for c in combinations(players, team_size)]

def duplicate_player(combo):
    """Returns True if a player occurs in more than one team"""
    return len(set.union(*combo)) < len(players)
    
result = (combo for combo in combinations(unique_teams, team_num) if not duplicate_player(combo))

result is a generator that can be iterated or turned into a list with list(result). On kaggle.com, it takes a minute or so to generate the whole list of all possible combinations (a total of 15400, in line with the computations by @beaker and @John Coleman in the comments). The teams are tuples of sets that look like this:

[({'A', 'B', 'C'}, {'D', 'E', 'F'}, {'G', 'H', 'I'}, {'J', 'K', 'L'}),
 ({'A', 'B', 'C'}, {'D', 'E', 'F'}, {'G', 'H', 'J'}, {'I', 'K', 'L'}),
 ({'A', 'B', 'C'}, {'D', 'E', 'F'}, {'G', 'H', 'K'}, {'I', 'J', 'L'}),
 ...
]

If you want, you can cast them into strings by calling ''.join() on each of them.

Whenas answered 31/3, 2022 at 20:51 Comment(0)
S
1

Another solution (players are numbered 0, 1, ...):

import itertools

def equipartitions(base_count: int, group_size: int):
    if base_count % group_size != 0:
        raise ValueError("group_count must divide base_count")

    return set(_equipartitions(frozenset(range(base_count)), group_size))


def _equipartitions(base_set: frozenset, group_size: int):
    if not base_set:
        yield frozenset()

    for combo in itertools.combinations(base_set, group_size):
        for rest in _equipartitions(base_set.difference(frozenset(combo)), group_size):
            yield frozenset({frozenset(combo), *rest})


all_combinations = [
    [tuple(team) for team in combo]
        for combo in equipartitions(12, 3)
]

print(all_combinations)
print(len(all_combinations))

And another:

import itertools
from typing import Iterable

def equipartitions(players: Iterable, team_size: int):
    if len(players) % team_size != 0:
        raise ValueError("group_count must divide base_count")

    return _equipartitions(set(players), team_size)


def _equipartitions(players: set, team_size: int):
    if not players:
        yield []
        return

    first_player, *other_players = players
    for other_team_members in itertools.combinations(other_players, team_size-1):
        first_team = {first_player, *other_team_members}
        for other_teams in _equipartitions(set(other_players) - set(first_team), team_size):
            yield [first_team, *other_teams]


all_combinations = [
    {''.join(sorted(team)) for team in combo} for combo in equipartitions(players='ABCDEFGHIJKL', team_size=3)
]


print(all_combinations)
print(len(all_combinations))
Stat answered 1/4, 2022 at 9:31 Comment(2)
I guess that's one of the "more sophisticated/efficient ways of doing this" :) very insightful!Whenas
@fsimonjetz. I think that your code was clever. But it was slower. On the other hand, if this is something that will just be run manually a few times then waiting a minute is no problem.Stat
H
0

Firstly, you can use a list comprehension to give you all of the possible combinations (regardless of the duplicates):

comb = [(a,b) for a in letters for b in letters if a != b]

And, afterwards, you can use the sorted function to sort the tuples. After that, to remove the duplicates, you can convert all of the items to a set and then back to a list.

var = [tuple(sorted(sub)) for sub in comb]
var = list(set(var))
Happiness answered 31/3, 2022 at 18:1 Comment(3)
This is a bit of a waste to generate combinations to discard them afterwards, and this solution doesn't work out-of-the-box for any n (it currently hard-codes n=2).Dhow
generating all the possibilities and removing the duplicates was my first idea, but with x = 12 this is 400 millions combos. Way too much for my computer to compiute. According to some comments, my final combo list should be around 15.400 combos (not verified). Should be possible to generate this list without having to generate all 400 millions combosLombok
@Lombok Yeah fair enough. Hadn't thought of that. I'm afraid I can't help much thenQuidnunc
S
0

You could use the list comprehension approach, which has a time complexity of O(n*n-1), or you could use a more verbose way, but with a slightly better time complexity of O(n^2-n)/2:

comb = []

for first_letter_idx, _ in enumerate(letters):
    for sec_letter_idx in range(first_letter_idx + 1, len(letters)):
        comb.append(letters[first_letter_idx] + letters[sec_letter_idx])

print(comb)

comb2 = []

for first_letter_idx, _ in enumerate(comb):
    for sec_letter_idx in range(first_letter_idx + 1, len(comb)):
        if (comb[first_letter_idx][0] not in comb[sec_letter_idx]
        and comb[first_letter_idx][1] not in comb[sec_letter_idx]):
            comb2.append([comb[first_letter_idx], comb[sec_letter_idx]])

print(comb2)

This algorithm needs more work to handle dynamic inputs. Maybe with recursion.

Sayed answered 31/3, 2022 at 19:24 Comment(4)
hi, this algorithm is giving me 15 combos, while there are only 3 (see the first post, i give the answer for x = 4). For example ['AB', 'AD'] is not a good combo since letter A is twice. I could remove duplicates later, but i'm afraid that with 12 letters it will be too many useless duplicates (probably millions)Lombok
Sorry, I just fixed it. It's not elegant in this stage. Later I'll try to improve it.Antagonism
O(n^2-n)/2 is very much O(n^2)Koopman
Yeap, pretty much.Antagonism
Z
-1

Use combination from itertools

from itertools import combinations 

x = list(combinations(['A','B','C','D'],2))

t = []
for i in (x):
    t.append(i[0]+i[1]) # concatenating the strings and adding in a list

g = []
for i in range(0,len(t),2):
    for j in range(i+1,len(t)):
        g.append([t[i],t[j]])
        break
print(g)
Zetland answered 31/3, 2022 at 19:3 Comment(2)
this code is generating the 6 combinations of 2 letters. That's not what i'm interested in. The final result should be the 3 combinations i gave in my first post. One combination is 2 group of 2 letters. ['AB', 'CD'] is one combination.Lombok
Check if this works for you..Since any pair like [AB, AC] in reverse as well as within element letters in reverse like BA, CA is considered as duplicate..the last for loop jumps 2 indexes in iterating the list created by the combinations..Zetland

© 2022 - 2024 — McMap. All rights reserved.