All combinations of set of dictionaries into K N-sized groups
Asked Answered
T

3

5

I though this would be straightforward, unfortunately, it is not.

I am trying to build a function to take an iterable of dictionaries (i.e., a list of unique dictionaries) and return a list of lists of unique groupings of the dictionaries.

If I have x players I would like to form k teams of n size.

This question and set of answers from CMSDK is the closest thing to a solution I can find. In adapting it from processing strings of letters to dictionaries I am finding my Python skills inadequate.

The original function that I am adapting comes from the second answer:

import itertools as it
def unique_group(iterable, k, n):
    """Return an iterator, comprising groups of size `k` with combinations of size `n`."""
    # Build separate combinations of `n` characters
    groups = ("".join(i) for i in it.combinations(iterable, n))    # 'AB', 'AC', 'AD', ...
    # Build unique groups of `k` by keeping the longest sets of characters
    return (i for i in it.product(groups, repeat=k) 
                if len(set("".join(i))) == sum((map(len, i))))     # ('AB', 'CD'), ('AB', 'CE'), ... 

My current adaptation (that utterly fails with an error of TypeError: object of type 'generator' has no len() because of the call to map(len, i)):

def unique_group(iterable, k, n):
    groups = []
    groups.append((i for i in it.combinations(iterable, n)))
    return ( i for i in it.product(groups, repeat=k) if len(set(i)) == sum((map(len, i))) )

For a bit of context: I am trying to programmatically divide a group of players into teams for Christmas Trivia based on their skills. The list of dictionaries is formed from a yaml file that looks like

- name: Patricia
  skill: 4
- name: Christopher
  skill: 6
- name: Nicholas
  skill: 7
- name: Bianca
  skill: 4

Which, after yaml.load produces a list of dictionaries:

players = [{'name':'Patricia', 'skill':4},{'name':'Christopher','skill':6},
           {'name':'Nicholas','skill':7},{'name':'Bianca','skill':4}]

So I expect output that would look like a list of these (where k = 2 and n = 2) :

(
    # Team assignment grouping 1
    (
        # Team 1
        ( {'name': 'Patricia', 'skill': 4}, {'name': 'Christopher', 'skill': 6} ),
        # Team 2
        ( {'name': 'Nicholas', 'skill': 7}, {'name': 'Bianca', 'skill': 4} )
    ),
    # Team assignment grouping 2
    (
        # Team 1
        ( {'name': 'Patricia', 'skill': 4}, {'name': 'Bianca', 'skill': 4} ),
        # Team 2
        ( {'name': 'Nicholas', 'skill': 7}, {'name': 'Christopher', 'skill': 6} )
    ),

    ...,

    # More unique lists

)

Each team assignment grouping needs to have unique players across teams (i.e., there cannot be the same player on multiple teams in a team assignment grouping), and each team assignment grouping needs to be unique.

Once I have the list of team assignment combinations I will sum up the skills in every group, take the difference between the highest skill and lowest skill, and choose the grouping (with variance) with the lowest difference between highest and lowest skills.

I will admit I do not understand this code fully. I understand the first assignment to create a list of all the combinations of the letters in a string, and the return statement to find the product under the condition that the product does not contain the same letter in different groups.

My initial attempt was to simply take the it.product(it.combinations(iterable, n), repeat=k) but this does not achieve uniqueness across groups (i.e., I get the same player on different teams in one grouping).

Thanks in advance, and Merry Christmas!


Update:

After a considerable amount of fiddling I have gotten the adaptation to this:

This does not work

def unique_group(iterable, k, n):
    groups = []
    groups.append((i for i in it.combinations(iterable, n)))
    return (i for i in it.product(groups, repeat=k)\
        if len(list({v['name']:v for v in it.chain.from_iterable(i)}.values())) ==\
        len(list([x for x in it.chain.from_iterable(i)])))

I get a bug

Traceback (most recent call last):
  File "./optimize.py", line 65, in <module>
    for grouping in unique_group(players, team_size, number_of_teams):
  File "./optimize.py", line 32, in <genexpr>
    v in it.chain.from_iterable(i)})) == len(list([x for x in
  File "./optimize.py", line 32, in <dictcomp>
    v in it.chain.from_iterable(i)})) == len(list([x for x in
TypeError: tuple indices must be integers or slices, not str

Which is confusing the crap out of me and makes clear I don't know what my code is doing. In ipython I took this sample output:

assignment = (
({'name': 'Patricia', 'skill': 4}, {'name': 'Bianca', 'skill': 4}),
({'name': 'Patricia', 'skill': 4}, {'name': 'Bianca', 'skill': 4})
)

Which is clearly undesirable and formulated the following test:

len(list({v['name']:v for v in it.chain.from_iterable(assignment)})) == len([v for v in it.chain.from_iterable(assignment)])

Which correctly responds False. But it doesn't work in my method. That is probably because I am cargo cult coding at this point.

I understand what it.chain.from_iterable(i) does (it flattens the tuple of tuples of dictionaries to just a tuple of dictionaries). But it seems that the syntax {v['name']:v for v in ...} does not do what I think it does; either that or I'm unpacking the wrong values! I am trying to test the unique dictionaries against the total dictionaries based on Flatten list of lists and Python - List of unique dictionaries but the answer giving me

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ] 
>>> list({v['id']:v for v in L}.values())

Isn't as easy to adapt in this circumstance as I thought, and I'm realizing I don't really know what is getting returned in the it.product(groups, repeat=k). I'll have to investigate more.

Toul answered 24/12, 2018 at 17:32 Comment(17)
Please include an example of the data/input - just enough to test the solution - along with the desired result.Himself
Okay, I'll use the list I included here. I have to produce the output based on that.Toul
You want to - look at all possible combinations of k by m players and pick the set that has the the least difference between groups?Himself
Yes. I can manually produce the least difference analysis (I believe I can, at least) once I have the list of unique groupings. (if by k by m you mean k is the number of teams and m is the number of players for each team.Toul
How about the initial list of dictionaries/players - the input?Himself
Say again? I'm not sure what you mean. I'm going to post an example with a shorter list of players.Toul
the first two rows of your expected output are the exact same?Lazuli
Sorry, mistake, fixing.Toul
@Himself I'm curious why you keep updating the output to move Christopher to the top and Bianca to the bottom in the second assignment. I want these group assignments to be unique.Toul
I did not edit your desired output. I only added what i think your initial list of dictionaries looks like - including the input makes it easier for potential answerers(?) to test their solution. - please fix the input if I got it wrong.Himself
Ah, sorry, corrected as best I can.Toul
What you want is to pick sets of N×M players out of the set of players and then divide them into the M teams of N players.Construction
Is @dand's clarification correct? Or is x always k*n?Irruptive
@Irruptive x is arbitrarily always k*n but mostly because when I initially set about coding the problem I didn't want to deal with the edge case of leftover players, so I in my script I just pop the lowest skilled players from the list of players for manual assignment later. If you want to code a more general solution, you are welcome to. I never responded to Dan because I didn't actually understand his comment. :-/Toul
Ok. You know that there are really a lot of combinations, right? There are (24!)/((4!)**6*6!) possible divisions of 24 people into 6 teams of 4, which is a total of about 4.5 trillion: 4509264634875, according to Python. It might take a while to enumerate them all.Irruptive
Lol, discovering that. Didn't think about it. Any ideas for doing this less naively?Toul
The underlying problem is related to the knapsack problem, so it's not solvable in less than exponential time. If you're going to settle for an approximation, try randomly assigning people to teams (eg shuffle the list and then count off the teams) and then try to improve the metric by randomly swapping a high skill person from a high skill team with a low skill person from a low skill team until your teams are roughly equal. Do that a few times and choose the one which strikes you as aesthetically best (knowing the people and all that).Irruptive
H
2

A list of dicts is not a good data structure for mapping what you actually want to rearrange, the player names, to their respective attributes, the skill ratings. You should transform the list of dicts to a name-to-skill mapping dict first:

player_skills = {player['name']: player['skill'] for player in players}
# player_skills becomes {'Patricia': 4, 'Christopher': 6, 'Nicholas': 7, 'Blanca': 4}

so that you can recursively deduct a combination of n players from the pool of players iterable, until the number of groups reaches k:

from itertools import combinations
def unique_group(iterable, k, n, groups=0):
    if groups == k:
        yield []
    pool = set(iterable)
    for combination in combinations(pool, n):
        for rest in unique_group(pool.difference(combination), k, n, groups + 1):
            yield [combination, *rest]

With your sample input, list(unique_group(player_skills, 2, 2)) returns:

[[('Blanca', 'Christopher'), ('Nicholas', 'Patricia')],
 [('Blanca', 'Nicholas'), ('Christopher', 'Patricia')],
 [('Blanca', 'Patricia'), ('Christopher', 'Nicholas')],
 [('Christopher', 'Nicholas'), ('Blanca', 'Patricia')],
 [('Christopher', 'Patricia'), ('Blanca', 'Nicholas')],
 [('Nicholas', 'Patricia'), ('Blanca', 'Christopher')]]

You can get the combination with the lowest variance in total skill ratings by using the min function with a key function that returns the skill difference between the team with the highest total skill ratings and the one with the lowest, which takes only O(n) in time complexity:

def variance(groups):
    total_skills = [sum(player_skills[player] for player in group) for group in groups]
    return max(total_skills) - min(total_skills)

so that min(unique_group(player_skills, 2, 2), key=variance) returns:

[('Blanca', 'Nicholas'), ('Christopher', 'Patricia')]
Hypermeter answered 24/12, 2018 at 22:49 Comment(4)
I haven't tested it yet, but it looks like it may be more efficient than the other solution (which is beautiful in its own right). I discovered the horrific inefficiency of the combinations/product the other night. One solution I came up with was to simply keep track of the last best valid grouping and print out every new best valid grouping. It still never worked for large numbers of players (it never reached a valid grouping after 5 hours with 20 players). I want to test your method (can't atm, I'm at work; will tonight) but it looks very promising.Toul
So I the min(unique_group(player_skills, k, n), key=variance) method is still too monolithic to handle large sets, but by simply establishing a last_best =10000 global variable and looping through the output of unique_group(player_skills, k, n), testing variance() of the grouping and outputting the new best each time, leveraging the generator expression, I was able to get down to a variance of 4 pretty quickly. Longer waiting might get me better. So I'm upvoting your solution and accepting it.Toul
For the record, that was on a set of 20 players with k = 4 and n = 5.Toul
I posted a github repo with the final code. Team UpToul
W
3

This is where I'd leverage the new dataclasses with sets. You can make a dataclass hashable by setting frozen=True in the decorator. First you'd add your players to a set to get unique players. Then you'd get all the combinations of players for n size teams. Then you could create a set of unique teams. Then create valid groupings whereas no player is represented more than once across teams. Finally you could calculate the max disparity in the total team skill level across the grouping (leveraging combinations yet again) and use that to sort your valid groupings. So something like this.

from dataclasses import dataclass
from itertools import combinations
from typing import FrozenSet

import yaml


@dataclass(order=True, frozen=True)
class Player:
    name: str
    skill: int


@dataclass(order=True, frozen=True)
class Team:
    members: FrozenSet[Player]

    def total_skill(self):
        return sum(p.skill for p in self.members)


def is_valid(grouping):
    players = set()
    for team in grouping:
        for player in team.members:
            if player in players:
                return False
            players.add(player)
    return True


def max_team_disparity(grouping):
    return max(
        abs(t1.total_skill() - t2.total_skill())
        for t1, t2 in combinations(grouping, 2)
    )


def best_team_matchups(player_file, k, n):
    with open(player_file) as f:
        players = set(Player(p['name'], p['skill']) for p in yaml.load(f))
    player_combs = combinations(players, n)
    unique_teams = set(Team(frozenset(team)) for team in player_combs)
    valid_groupings = set(g for g in combinations(unique_teams, k) if is_valid(g))
    for g in sorted(valid_groupings, key=max_team_disparity):
        print(g)


best_team_matchups('test.yaml', k=2, n=4)

Example output:

(
    Team(members=frozenset({
        Player(name='Chr', skill=6),
        Player(name='Christopher', skill=6),
        Player(name='Nicholas', skill=7),
        Player(name='Patricia', skill=4)
    })),
    Team(members=frozenset({
        Player(name='Bia', skill=4),
        Player(name='Bianca', skill=4),
        Player(name='Danny', skill=8),
        Player(name='Nicho', skill=7)
    }))
)
Weymouth answered 24/12, 2018 at 19:22 Comment(2)
That's pretty amazingly clean code. I'm going to play around with it a lot, but I can already see it works. One issue I'm seeing is that it takes a while to compute for larger teams (e.g., I have a set of 20 team members). I haven't seen a result yet (I tested it on a sample of 4 players). I was anticipating the possibility of unworkable computation times from the beginning :-(. Any advice in that regard would be helpful. I think the final team match-up tonight is going to be of 24 players. 4 teams of 6.Toul
I'm sure there's probably some clever math shortcuts to avoid brute force, but I don't know 'em... ;) Good luck.Weymouth
H
2

A list of dicts is not a good data structure for mapping what you actually want to rearrange, the player names, to their respective attributes, the skill ratings. You should transform the list of dicts to a name-to-skill mapping dict first:

player_skills = {player['name']: player['skill'] for player in players}
# player_skills becomes {'Patricia': 4, 'Christopher': 6, 'Nicholas': 7, 'Blanca': 4}

so that you can recursively deduct a combination of n players from the pool of players iterable, until the number of groups reaches k:

from itertools import combinations
def unique_group(iterable, k, n, groups=0):
    if groups == k:
        yield []
    pool = set(iterable)
    for combination in combinations(pool, n):
        for rest in unique_group(pool.difference(combination), k, n, groups + 1):
            yield [combination, *rest]

With your sample input, list(unique_group(player_skills, 2, 2)) returns:

[[('Blanca', 'Christopher'), ('Nicholas', 'Patricia')],
 [('Blanca', 'Nicholas'), ('Christopher', 'Patricia')],
 [('Blanca', 'Patricia'), ('Christopher', 'Nicholas')],
 [('Christopher', 'Nicholas'), ('Blanca', 'Patricia')],
 [('Christopher', 'Patricia'), ('Blanca', 'Nicholas')],
 [('Nicholas', 'Patricia'), ('Blanca', 'Christopher')]]

You can get the combination with the lowest variance in total skill ratings by using the min function with a key function that returns the skill difference between the team with the highest total skill ratings and the one with the lowest, which takes only O(n) in time complexity:

def variance(groups):
    total_skills = [sum(player_skills[player] for player in group) for group in groups]
    return max(total_skills) - min(total_skills)

so that min(unique_group(player_skills, 2, 2), key=variance) returns:

[('Blanca', 'Nicholas'), ('Christopher', 'Patricia')]
Hypermeter answered 24/12, 2018 at 22:49 Comment(4)
I haven't tested it yet, but it looks like it may be more efficient than the other solution (which is beautiful in its own right). I discovered the horrific inefficiency of the combinations/product the other night. One solution I came up with was to simply keep track of the last best valid grouping and print out every new best valid grouping. It still never worked for large numbers of players (it never reached a valid grouping after 5 hours with 20 players). I want to test your method (can't atm, I'm at work; will tonight) but it looks very promising.Toul
So I the min(unique_group(player_skills, k, n), key=variance) method is still too monolithic to handle large sets, but by simply establishing a last_best =10000 global variable and looping through the output of unique_group(player_skills, k, n), testing variance() of the grouping and outputting the new best each time, leveraging the generator expression, I was able to get down to a variance of 4 pretty quickly. Longer waiting might get me better. So I'm upvoting your solution and accepting it.Toul
For the record, that was on a set of 20 players with k = 4 and n = 5.Toul
I posted a github repo with the final code. Team UpToul
K
1

Instead of trying to create every possible grouping of k sets of n elements (possibly including repeats!), and then filtering down to the ones that don't have any overlap, let's directly build groupings that meet the criterion. This also avoids generating redundant groupings in different orders (the original code could also do this by using combinations rather than product in the last step).

The approach is:

  • Iterate over possibilities (combinations of n elements in the input) for the first set - by which I mean, the one that contains the first of the elements that will be chosen.
  • For each, recursively find possibilities for the remaining sets. They cannot use elements from the first set, and they also cannot use elements from before the first set (or else the first set wouldn't be first).

In order to combine the results elegantly, we use a recursive generator: rather than trying to build lists that contain results from the recursive calls, we just yield everything we need to. We represent each collection of group_count many elements with a tuple of tuples (the inner tuples are the groups). At the base case, there is exactly one way to make no groups of elements - by just... doing that... yeah... - so we need to yield one value which is a tuple of no tuples of an irrelevant number of elements each - i.e., an empty tuple. In the other cases, we prepend the tuple for the current group to each result from the recursive call, yielding all those results.

from itertools import combinations

def non_overlapping_groups(group_count, group_size, population):
    if group_count == 0:
        yield ()
        return
    for indices in combinations(range(len(population)), group_size):
        current = (tuple(population[i] for i in indices),)
        remaining = [
            x for i, x in enumerate(population)
            if i not in indices and i > indices[0]
        ] if indices else population
        for recursive in non_overlapping_groups(group_count - 1, group_size, remaining):
            yield current + recursive

Let's try it:

>>> list(non_overlapping_groups(2, 3, 'abcdef'))
[(('a', 'b', 'c'), ('d', 'e', 'f')), (('a', 'b', 'd'), ('c', 'e', 'f')), (('a', 'b', 'e'), ('c', 'd', 'f')), (('a', 'b', 'f'), ('c', 'd', 'e')), (('a', 'c', 'd'), ('b', 'e', 'f')), (('a', 'c', 'e'), ('b', 'd', 'f')), (('a', 'c', 'f'), ('b', 'd', 'e')), (('a', 'd', 'e'), ('b', 'c', 'f')), (('a', 'd', 'f'), ('b', 'c', 'e')), (('a', 'e', 'f'), ('b', 'c', 'd'))]
>>> list(non_overlapping_groups(3, 2, 'abcdef'))
[(('a', 'b'), ('c', 'd'), ('e', 'f')), (('a', 'b'), ('c', 'e'), ('d', 'f')), (('a', 'b'), ('c', 'f'), ('d', 'e')), (('a', 'c'), ('b', 'd'), ('e', 'f')), (('a', 'c'), ('b', 'e'), ('d', 'f')), (('a', 'c'), ('b', 'f'), ('d', 'e')), (('a', 'd'), ('b', 'c'), ('e', 'f')), (('a', 'd'), ('b', 'e'), ('c', 'f')), (('a', 'd'), ('b', 'f'), ('c', 'e')), (('a', 'e'), ('b', 'c'), ('d', 'f')), (('a', 'e'), ('b', 'd'), ('c', 'f')), (('a', 'e'), ('b', 'f'), ('c', 'd')), (('a', 'f'), ('b', 'c'), ('d', 'e')), (('a', 'f'), ('b', 'd'), ('c', 'e')), (('a', 'f'), ('b', 'e'), ('c', 'd'))]
>>> # Some quick sanity checks
>>> len(list(non_overlapping_groups(2, 3, 'abcdef')))
10
>>> # With fewer input elements, obviously we can't do it.
>>> len(list(non_overlapping_groups(2, 3, 'abcde')))
0
>>> # Adding a 7th element, any element could be the odd one out,
>>> # and in each case we get another 10 possibilities, making 10 * 7 = 70.
>>> len(list(non_overlapping_groups(2, 3, 'abcdefg')))
70

I performance tested this against a modified version of the original (which also shows how to make it work properly with non-strings, and optimizes the sum calculation):

def unique_group(group_count, group_size, population):
    groups = list(it.combinations(population, group_size))
    return (
        i for i in combinations(groups, group_count) 
        if len({e for g in i for e in g}) == group_count * group_size
    )

Quickly verifying the equivalence:

>>> len(list(unique_group(3, 2, 'abcdef')))
15
>>> len(list(non_overlapping_groups(3, 2, 'abcdef')))
15
>>> set(unique_group(3, 2, 'abcdef')) == set(non_overlapping_groups(3, 2, 'abcdef'))
True

We see that even for fairly small examples (here, the output has 280 groupings), the brute-force approach has to filter through a lot:

>>> import timeit
>>> timeit.timeit("list(g(3, 3, 'abcdefghi'))", globals={'g': unique_group}, number=100)
5.895461600041017
>>> timeit.timeit("list(g(3, 3, 'abcdefghi'))", globals={'g': non_overlapping_groups}, number=100)
0.2303082060534507
Kibbutznik answered 6/9, 2022 at 4:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.