Best approach to a variation of a bucketing problem
Asked Answered
G

2

8

Find the most appropriate team compositions for days in which it is possible. A set of n participants, k days, a team has m slots. A participant specifies how many days he wants to be a part of and which days he is available.

Result constraints:

  1. Participants must not be participating in more days than they want
  2. Participants must not be scheduled in days they are not available in.
  3. Algorithm should do its best to include as many unique participants as possible.
  4. A day will not be scheduled if less than m participants are available for that day.

I find myself solving this problem manually every week at work for my football team scheduling and I'm sure there is a smart programmatic approach to solve it. Currently, we consider only 2 days per week and colleagues write down their name for which day they wanna participate, and it ends up having big lists for each day and impossible to please everyone.

I considered a new approach in which each colleague writes down his name, desired times per week to play and which days he is available, an example below:

Kane 3 1 2 3 4 5

The above line means that Kane wants to play 3 times this week and he is available Monday through Friday. First number represents days to play, next numbers represent available days(1 to 7, MOnday to Sunday).

Days with less than m (in my case, m = 12) participants are not gonna be scheduled. What would be the best way to approach this problem in order to find a solution that does its best to include each participant at least once and also considers their desires(when to play, how much to play).

I can do programming, I just need to know what kind of algorithm to implement and maybe have a brief logical explanation for the choice.

Result constraints:

  1. Participants must not play more than they want
  2. Participants must not be scheduled in days they don't want to play
  3. Algorithm should do its best to include as many participants as possible.
  4. A day will not be scheduled if less than m participants are available for that day.
Gonorrhea answered 12/9, 2019 at 19:18 Comment(9)
can 15 be scheduled on same day (more than m)?Oster
You've described a multi-dimensional version of the set coverage problem. I see various direct attacks on this -- I would expect to see, not a request for an algorithm, but a partial solution with a problematic result.Distribute
A brute-force walk through the available legal choices should still be fast enough to give you a rapid solution in human terms. What is you trade-off between quantity of participants and days played? What is the maximum quantity of players you can schedule on a single day?Distribute
In my real case, I am looking for 12 participants in a day, but the day can still be scheduled with only 10. More than 12 means somebody is sitting on the bench.Gonorrhea
Again, in my case, we use all 7 days and have about 40 possible participants. Most people avoid weekends and we end up fighting over the first 5 days. Maximum quantity of players is still m.Gonorrhea
@Distribute This question really wouldn't be improved by a bunch of scrap code.Tice
@DavidEisenstat: I disagree; the problem is tractable enough that I expect OP to post an honest attempt. If nothing else, a valid sub-problem (say, a roster of 7 and 3-4 required for a game).Distribute
Just to be clear, teams are recreated each day, correct?Murton
Constraint programming is the solution here.Hamer
T
2

Scheduling problems can get pretty gnarly, but yours isn't too bad actually. (Well, at least until you put out the first automated schedule and people complain about it and you start adding side constraints.)

The fact that a day can have a match or not creates the kind of non-convexity that makes these problems hard, but if k is small (e.g., k = 7), it's easy enough to brute force through all of the 2k possibilities for which days have a match. For the rest of this answer, assume we know.

Figuring out how to assign people to specific matches can be formulated as a min-cost circulation problem. I'm going to write it as an integer program because it's easier to understand in my opinion, and once you add side constraints you'll likely be reaching for an integer program solver anyway.

Let P be the set of people and M be the set of matches. For p in P and m in M let p ~ m if p is willing to play in m. Let U(p) be the upper bound on the number of matches for p. Let D be the number of people demanded by each match.

For each p ~ m, let x(p, m) be a 0-1 variable that is 1 if p plays in m and 0 if p does not play in m. For all p in P, let y(p) be a 0-1 variable (intuitively 1 if p plays in at least one match and 0 if p plays in no matches, but hold on a sec). We have constraints

# player doesn't play in too many matches
for all p in P, sum_{m in M | p ~ m} x(p, m) ≤ U(p)

# match has the right number of players
for all m in M, sum_{p in P | p ~ m} x(p, m) = D

# y(p) = 1 only if p plays in at least one match
for all p in P, y(p) ≤ sum_{m in M | p ~ m} x(p, m)

The objective is to maximize

sum_{p in P} y(p)

Note that we never actually force y(p) to be 1 if player p plays in at least one match. The maximization objective takes care of that for us.

You can write code to programmatically formulate and solve a given instance as a mixed-integer program (MIP) like this. With a MIP formulation, the sky's the limit for side constraints, e.g., avoid playing certain people on consecutive days, biasing the result to award at least two matches to as many people as possible given that as many people as possible got their first, etc., etc.

Tice answered 13/9, 2019 at 14:19 Comment(0)
N
2

I have an idea if you need a basic solution that you can optimize and refine by small steps. I am talking about Flow Networks. Most of those that already know what they are are probably turning their nose because flow network are usually used to solve maximization problem, not optimization problem. And they are right in a sense, but I think it can be initially seen as maximizing the amount of player for each day that play. No need to say it is a kind of greedy approach if we stop here. No more introduction, the purpose is to find the maximum flow inside this graph:

flow network

Each player has a number of days in which he wants to play, represented as the capacity of each edge from the Source to node player x. Each player node has as many edges from player x to day_of_week as the capacity previously found. Each of this 2nd level edges has a capacity of 1. The third level is filled by the edges that link day_of_week to the sink node. Quick example: player 2 is available 2 days: monday and tuesday, both have a limit of player, which is 12.

Until now 1st, 2nd and 4th constraints are satisfied (well, it was the easy part too): after you found the maximum flow of the entire graph you only select those path that does not have any residual capacity both on 2nd level (from players to day_of_weeks) and 3rd level (from day_of_weeks to the sink). It is easy to prove that with this level of "optimization" and under certain conditions, it is possible that it will not find any acceptable path even though it would have found one if it had made different choices while visiting the graph.

This part is the optimization problem that i meant before. I came up with at least two heuristic improvements:

  • While you visit the graph, store day_of_weeks in a priority queue where days with more players assigned have a higher priority too. In this way the amount of residual capacity of the entire graph is certainly less evenly distributed.
  • randomness is your friend. You are not obliged to run this algorithm only once, and every time you run it you should pick a random edge from a node in the player's level. At the end you average the results and choose the most common outcome. This is an situation where the majority rule perfectly applies.

Better to specify that everything above is just a starting point: the purpose of heuristic is to find the best approximated solution possible. With this type of problem and given your probably small input, this is not the right way but it is the easiest one when you do not know where to start.

Norward answered 16/9, 2019 at 21:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.