How to set up linear programming optimization in R using LpSolve?
Asked Answered
H

1

5

For example, I have this sample data:

d=data.frame(x=c(1,1,1,2,2,3,4,4),y=c(5,6,7,8,7,5,6,5),w=c(1,2,3,4,5,6,7,8))

Which looks like this:

  x y w
1 1 5 1
2 1 6 2
3 1 7 3
4 2 8 4
5 2 7 5
6 3 5 6
7 4 6 7
8 4 5 8

x and y represent indices from datax and datay. w represents a score from comparing datax[x] with datay[y]. I want to maximize the total score (or w) from d, where each value of x is matched to at most one value of y, and vice versa.

The result should look like this:

  x y w
1 2 7 5
2 3 5 6
3 4 6 7

Where the sum of all w values is maximized, and each x and each y show up only once in the result.

How do I set this problem up in the lpSolve::lp function?

Hyperspace answered 16/11, 2015 at 7:7 Comment(2)
I guess your expected output is not correct as 5 appears two times in y. Or did I misunderstood what you want ?Baron
@Baron yes sorry about that! I made a mistake there, I updated it (should be correct this time). Not all x or y values need to be included, the weight just needs to be maximized.Hyperspace
A
7

You can use lpSolveAPI to solve your problem. Your stated solution is not quite feasible given your constraints. So let's go with you wanting X's and Y's to not repeat in the solution.

You will need 8 new binary variables. Each variables specifies whether or not that row in d is getting picked (1) or dropped (0).

Update based on OP's request

Yes, the lpSolveAPI code (below) makes it look more complicated than it really is. This LP formulation (output of lpSolveAPI) should make things clearer:

/* Objective function */
max: +pick_1 +2 pick_2 +3 pick_3 +4 pick_4 +5 pick_5 +6 pick_6 +7 pick_7 +8 pick_8;

/* Constraints */
OneX_1: +pick_1 +pick_2 +pick_3 <= 1;
OneX_2: +pick_4 +pick_5 <= 1;
OneX_4: +pick_7 +pick_8 <= 1;
OneY_5: +pick_1 +pick_6 +pick_8 <= 1;
OneY_6: +pick_2 +pick_7 <= 1;
OneY_7: +pick_3 +pick_5 <= 1;

/* Variable bounds */
pick_1 <= 1;
pick_2 <= 1;
pick_3 <= 1;
pick_4 <= 1;
pick_5 <= 1;
pick_6 <= 1;
pick_7 <= 1;
pick_8 <= 1;

Explanation: The second constraint (OneX_2) simply states that only one of pick_4 or pick_5 can be 1, since the 4th and 5th rows in the data frame d have X = 2

Solution

Note that the formulation above produces an optimal solution that selects 4 rows in the data frame d

> d[c(3,4,6,7),]
  x y w
3 1 7 3
4 2 8 4
6 3 5 6
7 4 6 7

The sum of w's is 20, which is better than the solution in the question.

Code

library(lpSolveAPI)
d <- data.frame(x=c(1,1,1,2,2,3,4,4),y=c(5,6,7,8,7,5,6,5),w=c(1,2,3,4,5,6,7,8))

ncol <- 8 #you have eight rows that can be picked or dropped from the solution set
lp_rowpicker <- make.lp(ncol=ncol)
set.type(lp_rowpicker, columns=1:ncol, type = c("binary"))

obj_vals <- d[, "w"]
set.objfn(lp_rowpicker, obj_vals) 
lp.control(lp_rowpicker,sense='max')

#Add constraints to limit X values from repeating
add.constraint(lp_rowpicker, xt=c(1,1,1), #xt specifies which rows of the LP
               indices=c(1,2,3), rhs=1, type="<=")
add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
               indices=c(4,5), rhs=1, type="<=")
add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
               indices=c(7,8), rhs=1, type="<=") #x's in dataframe rows 7 & 8 are both '4'

#Add constraints to limit Y values from repeating
add.constraint(lp_rowpicker, xt=c(1,1,1), #xt specifies which rows of the LP
               indices=c(1,6,8), rhs=1, type="<=") #Y's in df rows 1,6 & 8 are all '5'
add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
               indices=c(2,7), rhs=1, type="<=") #Y's in dataframe rows 2&7 are both '6'
add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
               indices=c(3,5), rhs=1, type="<=") #y's in dataframe rows 3&5 are both '7'

solve(lp_rowpicker)
get.objective(lp_rowpicker) #20
get.variables(lp_rowpicker)
#[1] 0 0 1 1 0 1 1 0
#This tells you that from d you pick rows: 3,4,6 & 7 in your optimal solution.

#If you want to look at the full formulation:
rownames1 <- paste("OneX", c(1,2,4), sep="_")
rownames2 <- paste("OneY", c(5,6,7), sep="_")
colnames<- paste("pick_",c(1:8), sep="")
dimnames(lp_rowpicker) <- list(c(rownames1, rownames2), colnames)
print(lp_rowpicker)

#write it to a text file
write.lp(lp_rowpicker,filename="max_w.lp")

Hope this gives you an idea of how to use lpSolveAPI to formulate your problem.

Aforethought answered 16/11, 2015 at 18:44 Comment(2)
thank you for the answer! I made a mistake with the desired output. I updated it now and it should make sense given the constraints. Not all x or y values need to be included, the weight just needs to be maximized. I will study the answer in more detail, but it seems quite complicated, is it possible to solve the problem simply with the lp function with the updated desired result?Hyperspace
I have updated the response by adding the actual math formulation and explained the constraints. This should make it clearer. Note that the solver was able to find a better solution than the one you gave.Aforethought

© 2022 - 2024 — McMap. All rights reserved.