Solving task scheduling or bin-packing optimizations in R

Asked 6/5, 2012 at 2:11 Answered 20/3, 2015 at 13:31

Solved r optimization knapsack-problem bin-packing resource-scheduling

I have an optimisation issue. It's about a product that contains 20 parts (the order of producing doesn't matter). I've got 3 similar machine that can produce all 20 parts.

I've got the 20 parts represented in minutes (ie. it takes 3min to produce the first part and 75min to produce the second part, etc)

ItemTime<-c(3,75,55,12,45,55,11,8,21,16,65,28,84,3,58,46,5,84,8,48)

So to produce 1 product it takes 730 min.

sum(ItemTime)

The aim is to minimise the production of one product by allocating the good item to the three machines.

sum(ItemTime/3)

So actually I need to be as close as 243.333 min (730/3)

The amount of possibility is huge 3^20

I guess there are many different optimal solutions. I would like that R give me all of them. I don't need only to know total time that will need machine 1 2 and 3 : I also need to know which items to give to machine 1, to machine 2 and to manchine 3.

Alternatively, if it's too long I would like to choose a sample without repetition that is as reasonable as possible...

Can I solve my issue with R language?

Java answered 6/5, 2012 at 2:11 Comment(0)

I believe your problem is a close variant of the Multiple Knapsack Problem (MKP) which is, a priori, not a piece of cake...

However, your dimension is small enough that the problem can be solved as a Mixed-Integer Programming (MIP). I solved it below using Rglpk; it took the solver about four minutes. If you are lucky enough to have access to CPLEX, I would highly recommend you use Rcplex instead, it will smoke it.

ItemTime<-c(3,75,55,12,45,55,11,8,21,16,65,28,84,3,58,46,5,84,8,48)
N <- length(ItemTime)
M <- 3

Problem formulation:

# variables are in this order:
# z: slack variable for the max of (s1, s2, s3)
# s1: sum of times for machine 1
# s2: sum of times for machine 2
# s3: sum of times for machine 3
# a1-a20: booleans for assignment to machine1
# b1-b20: booleans for assignment to machine1
# c1-c20: booleans for assignment to machine1

obj <- c(1, rep(0, 3 + 3*N))

mat <- rbind(
  c(1, -1, 0, 0, rep(0, M*N)),                      # z >= s1
  c(1, 0, -1, 0, rep(0, M*N)),                      # z >= s2
  c(1, 0, 0, -1, rep(0, M*N)),                      # z >= s3
  c(0, -1, 0, 0, ItemTime,  rep(0, N), rep(0, N)),  # s1 = ...
  c(0, 0, -1, 0, rep(0, N), ItemTime,  rep(0, N)),  # s2 = ...
  c(0, 0, 0, -1, rep(0, N), rep(0, N), ItemTime),   # s3 = ...
  cbind(matrix(0, N, 4), diag(N), diag(N), diag(N)) # a_i + b_i + c_i = 1
)

dir <- c( ">=", ">=", ">=", "==", "==", "==" , rep("==", N))

rhs <- c(rep(0, 2*M), rep(1, N))

types <- c(rep("C", 1+M), rep("B", M*N))

Now let's solve it:

Rglpk_solve_LP(obj, mat, dir, rhs, types, max=FALSE, verbose=TRUE)

# GLPK Simplex Optimizer, v4.47
# INTEGER OPTIMAL SOLUTION FOUND
# $optimum
# [1] 244
# 
# $solution
#  [1] 244 243 243 244   0   1   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   1   0   0   0   0   1   1   0   0
# [31]   1   1   1   0   1   0   0   0   1   0   1   0   1   0   1   0   0   0   1   1   0   0   0   1   0   1   0   1   0   1
# [61]   0   0   0   1
# 
# $status
# [1] 0

Sanfred answered 6/5, 2012 at 3:45 Comment(12)

I think this problem has more (and different) structure to any of the knapsack problems, so I'd be interested if you could go into more detail about the similarities. :) – Atrip 6/5, 2012 at 4:5

Yes, @dbaupp. In this particular case, it is easy to tell that a {243, 243, 244} or {242, 244, 244} solution, if it exists, is optimal. So one could go and solve the "multiple knapsack problem" (as defined here: en.wikipedia.org/wiki/List_of_knapsack_problems) for each of those two sets of weights. If either of the two problems has a solution where the three machines are fully loaded, then we have found an optimal solution to the original problem. Again, it is "variant"... – Sanfred 6/5, 2012 at 4:19

@dbaupp, I am intrigued by your claim that the greedy approach is optimal. At first, I thought "no way!", but as I can't seem to find a counter-example, I am more and more convinced you might be right. If it is the case, then I killed an ant with a sledgehammer! – Sanfred 6/5, 2012 at 4:26

Ah, ok, I understand the similarity now. I also retract my claim that it has more structure: the problem has less structure, since it is a pure minimisation problem (minimise the total time) without any other constraints. I think a proof that the greedy algorithm is optimal might be induction to prove it is optimal at every step (it's obviously optimal for 0 tasks, and the method of adding means that if the solution with the first k tasks is optimal then the solution with the first k + 1 tasks is), although there is probably some subtlety in the inductive step. – Atrip 6/5, 2012 at 4:48

(My claim of optimality was made based on a dimly remembered Wikipedia (or something) article, and I can't seem to hit on the right search terms to find it again.) – Atrip 6/5, 2012 at 4:49

It seems that the greedy approach is actually not optimal, see @han's comment on my answer. – Atrip 6/5, 2012 at 8:30

(Also, I just found that this problem is well-known, and called multiprocessor scheduling.) – Atrip 6/5, 2012 at 8:46

@Sanfred & all : Thank you so much for your help. I've tried the code and is very good working (don't wory I can wait 4minutes lol and even more). Actually, I'll go to read the Rglpk documentation. Indeed I need to understand all code you gave me and now it s not the case because my math level is quite poor :(. I have a question, I not understand the meaning of the solution on [1] there is 244 243 243 244 then only 0 and 1. How I have to interpret that? On [31] there is only 0 and 1 how i have to interpret that. And on [61] i dont understand too ... Thank you – Java 7/5, 2012 at 3:54

(By the way I'm new on stackoverflow , did i post the question on the good place ? I did not know where to ask more about the question so I put it on the comments but I dont know if someone saw it ??) – Java 7/5, 2012 at 22:18

For the meaning of the solution, you can refer to the comments I left about the variables in the problem formulation. 244 is your final optimal time, 243, 243, 244 are the times for each of the three machines. The next 20 values show if parts should be produced (1) or not (0) by the 1st machine, the next 20 are for the second machine, the last 20 are for the last machine. – Sanfred 7/5, 2012 at 23:19

@Sanfred Hum actually I've read the documentation and I'm trying to do some more test by curiousity. I just have a question if i add 21 [N=21] for instance the code is still working. But if I add 22 [N=22] the code is running and nothing happen even 35min later. So I stoped running. So by curiousity do you think it's because 22 is too much for the computer and it's normal to wait a long time. Or do you think is becase the code has to be changed? Thank you. – Java 9/5, 2012 at 2:38

hum does someone has an idea (about my last comment?) – Java 12/5, 2012 at 19:49

Edit: Obviously this problem is slightly different to the one I remember, because as @han shows, this algorithm is not optimal, merely an approximation (although there is a guarantee that the "makespan" from this algorithm is never more than 4/3 * optimal makespan in general and 11/9 * optimal for 3 machines.).

The link @han provided linked to the multiprocessor scheduling article, which is precisely equivalent to this problem. The article tells us that the problem is actually NP-hard. Which means there is no polynomial time algorithm to compute the optimal answer (much less a O(n log n) like we have here).

You can just use a greedy algorithm: go through the list and assign the job that takes the longest to the machine that currently has the least work allocated to it.

As an example, consider just having c(5,3,6,1,2) as your part manufacturing times. First you sort this into decreasing order: c(6,5,3,2,1), and then go through it (in order) assigning tasks.

     Step:  1    2    3    4    5
Machine 1:  6    6    6    6    6
Machine 2:  -    5    5    5    5,1
Machine 3:  -    -    3    3,2  3,2

So machine 1 makes the part that takes 6 minutes, machine 2 makes the ones that take 1 and 5 minutes and machine 3 makes the one that takes 3 and 2.

You can see that in step 4, the machine with the shortest total time is machine 3 so that is why we assigned the 2 to it.

~~From memory, this algorithm is actually optimal; although I don't have a link for that claim. I also don't know if you can adapt it to get all possible optimal results.~~

If you define a function that takes one job and a list of the machines with their current jobs, you can use Reduce to assign all the jobs. The single job assignment might look something like:

assign.job <- function(machines, job) {
    which.machines <- which.min(lapply(machines, sum))
    machines[[which.machines]] <- c(machines[[which.machines]], job)
    machines
}

(I don't like the machines[[which.machines]] line... I'm sure there is a better way to modify a list at a specific index.)

And then the assignment could be like:

allocate <- function(num.machines, job.times) {
    machines <- lapply(1:num.machines, function(...) c())
    Reduce(assign.job,
           sort(job.times, decreasing=TRUE),
           machines)
}

(I don't like the line that starts machines <-: I'm sure there is a neater way of creating a list of length n, but I can't find it.)

On your example, this gives:

> allocate(3,ItemTime)
[[1]]
[1] 84 58 46 45  8  3     # total time: 244

[[2]]
[1] 84 55 55 21 16  8  5  # total time: 244

[[3]]
[1] 75 65 48 28 12 11  3  # total time: 242

The final step is working out which job corresponds to which time: this can either be done by working backwards after assigning the times (this can be done in approximately linear time, since there is a relatively simple mapping from times to job and vice versa) or by modifying allocate and assign.job to keep track of the indices of the jobs (if you are going to do this then the order function will be more useful than sort, and using dataframes instead of vectors, to store times in one column and indices in another).

(It should be noted that this solution is several times faster than the other, since the other answer is using higher powered algorithms, which are ~~possibly~~ not overkill for this problem... ~~"possibly" because I'm still not 100% sure this algorithm is optimal.~~)

Atrip answered 6/5, 2012 at 2:54 Comment(4)

Your approach is close to the best-fit-decreasing algorithm for the bin packing problem, but it is not optimal. As a counter-example, consider the weights 10,6,5,4,3,2. Your algorithm assigns the jobs as follows: (10), (6,3,2) and (5,4), with a makespan of 11. The optimal assignment is (10), (6,4) and (5,3,2) with a makespan of 10. – Infinity 6/5, 2012 at 8:23

Ah, thanks for that! I had obviously remembered incorrectly. (I'll edit my answer to make that clear.) – Atrip 6/5, 2012 at 8:30

@han, the bin packing article linked to this one, which is precisely the same problem, and the algorithm listed there is precisely the one I described above. – Atrip 6/5, 2012 at 8:44

Ok, I missed that one. So in the case of 3 machines there would be an approximation guarantee of 11/9. – Infinity 6/5, 2012 at 8:52

As noted in other answers this is related to the bin packing problem. A simple bin packing algorithm is already implemented in the BBmisc package; we can apply this existing function here for a simple and fast solution:

library(BBmisc)

binlimit <- 3 # specify how many bins
binsize <- ceiling(sum(ItemTime)/binlimit) # calculate the minimum possible bin size (244)
binPack(ItemTime, binsize) # pack the bins

 [1] 3 1 2 3 3 2 2 3 3 3 2 3 1 3 2 3 3 1 3 3

In this case, it instantly produces an optimal solution using 3 bins. We can confirm the solution's bin sizes:

library(dplyr)
df <- data.frame(ItemTime, bins)
df %>% group_by(bins) %>% summarise (time = sum(ItemTime))

  bins time
1    1  243
2    2  244
3    3  243

If binPack produces an initial solution using too many bins, it can be placed in a for-loop that increments the binsize and only returns a solution when there are no more than 3 bins in the output of binPack.

Interestingly, binPack returns a solution with the same bin sizes as flodel's answer above, but with different assignments in bins 2 and 3:

   ItemTime Rglpk binPack
1         3     3       3
2        75     1       1
3        55     2       2
4        12     2       3
5        45     3       3
6        55     3       2
7        11     2       2
8         8     2       3
9        21     2       3
10       16     3       3
11       65     2       2
12       28     3       3
13       84     1       1
14        3     3       3
15       58     2       2
16       46     3       3
17        5     2       3
18       84     1       1
19        8     2       3
20       48     3       3

While binPack provides a fast and simple way to solve this problem, its description notes that this algorithm is simple and may not return the optimal solution:

Maps numeric items in x into groups with sum less or equal than capacity. A very simple greedy algorithm is used, which is not really optimized for speed. This is a convenience function for smaller vectors, not a competetive solver for the real binbacking problem. If an element of x exceeds capacity, an error is thrown.

Doublejointed answered 20/3, 2015 at 13:31 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags