You can use lpSolveAPI to solve your problem. Your stated solution is not quite feasible given your constraints. So let's go with you wanting X's and Y's to not repeat in the solution.
You will need 8 new binary variables. Each variables specifies whether or not that row in d
is getting picked (1) or dropped (0).
Update based on OP's request
Yes, the lpSolveAPI code (below) makes it look more complicated than it really is. This LP formulation (output of lpSolveAPI) should make things clearer:
/* Objective function */
max: +pick_1 +2 pick_2 +3 pick_3 +4 pick_4 +5 pick_5 +6 pick_6 +7 pick_7 +8 pick_8;
/* Constraints */
OneX_1: +pick_1 +pick_2 +pick_3 <= 1;
OneX_2: +pick_4 +pick_5 <= 1;
OneX_4: +pick_7 +pick_8 <= 1;
OneY_5: +pick_1 +pick_6 +pick_8 <= 1;
OneY_6: +pick_2 +pick_7 <= 1;
OneY_7: +pick_3 +pick_5 <= 1;
/* Variable bounds */
pick_1 <= 1;
pick_2 <= 1;
pick_3 <= 1;
pick_4 <= 1;
pick_5 <= 1;
pick_6 <= 1;
pick_7 <= 1;
pick_8 <= 1;
Explanation: The second constraint (OneX_2) simply states that only one of pick_4
or pick_5
can be 1, since the 4th and 5th rows in the data frame d
have X = 2
Solution
Note that the formulation above produces an optimal solution that selects 4 rows in the data frame d
> d[c(3,4,6,7),]
x y w
3 1 7 3
4 2 8 4
6 3 5 6
7 4 6 7
The sum of w's is 20, which is better than the solution in the question.
Code
library(lpSolveAPI)
d <- data.frame(x=c(1,1,1,2,2,3,4,4),y=c(5,6,7,8,7,5,6,5),w=c(1,2,3,4,5,6,7,8))
ncol <- 8 #you have eight rows that can be picked or dropped from the solution set
lp_rowpicker <- make.lp(ncol=ncol)
set.type(lp_rowpicker, columns=1:ncol, type = c("binary"))
obj_vals <- d[, "w"]
set.objfn(lp_rowpicker, obj_vals)
lp.control(lp_rowpicker,sense='max')
#Add constraints to limit X values from repeating
add.constraint(lp_rowpicker, xt=c(1,1,1), #xt specifies which rows of the LP
indices=c(1,2,3), rhs=1, type="<=")
add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
indices=c(4,5), rhs=1, type="<=")
add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
indices=c(7,8), rhs=1, type="<=") #x's in dataframe rows 7 & 8 are both '4'
#Add constraints to limit Y values from repeating
add.constraint(lp_rowpicker, xt=c(1,1,1), #xt specifies which rows of the LP
indices=c(1,6,8), rhs=1, type="<=") #Y's in df rows 1,6 & 8 are all '5'
add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
indices=c(2,7), rhs=1, type="<=") #Y's in dataframe rows 2&7 are both '6'
add.constraint(lp_rowpicker, xt=c(1,1), #xt specifies which rows of the LP
indices=c(3,5), rhs=1, type="<=") #y's in dataframe rows 3&5 are both '7'
solve(lp_rowpicker)
get.objective(lp_rowpicker) #20
get.variables(lp_rowpicker)
#[1] 0 0 1 1 0 1 1 0
#This tells you that from d you pick rows: 3,4,6 & 7 in your optimal solution.
#If you want to look at the full formulation:
rownames1 <- paste("OneX", c(1,2,4), sep="_")
rownames2 <- paste("OneY", c(5,6,7), sep="_")
colnames<- paste("pick_",c(1:8), sep="")
dimnames(lp_rowpicker) <- list(c(rownames1, rownames2), colnames)
print(lp_rowpicker)
#write it to a text file
write.lp(lp_rowpicker,filename="max_w.lp")
Hope this gives you an idea of how to use lpSolveAPI to formulate your problem.