Optimization help involving matrix operations and constraints
Asked Answered
S

2

7

I'm so far out of my league on this one, so I'm hoping someone can point me in the right direction. I think this is an optimization problem, but I have been confused by scipy.optimize and how it fits with pulp. Also, matrix math boggles my mind. Therefore this problem has really been slowing me down without to ask.

Problem Statement: I have a dataset of customers. For each customer, I can make a choice of 3 options, or choose none. So 4 options. Also for each customer, I have a numeric score that says how "good" each choice is. You can imagine this value as the probability of the Choice to create a future sale.

# fake data for the internet
data = {'customerid':[101,102,103,104,105,106,107,108,109,110],
        'prob_CHOICEA':[0.00317,0.00629,0.00242,0.00253,0.00421,0.00414,0.00739,0.00549,0.00658,0.00852],
        'prob_CHOICEB':[0.061,0.087,0.055,0.027,0.022,0.094,0.099,0.072,0.018,0.052],
        'prob_CHOICEC':[0.024,0.013,0.091,0.047,0.071,0.077,0.067,0.046,0.077,0.044]
       } 

# Creates pandas DataFrame 
df = pd.DataFrame(data) 
df = df.reset_index(drop=True).set_index(['customerid'])
+------------+--------------+--------------+--------------+
| customerid | prob_CHOICEA | prob_CHOICEB | prob_CHOICEC |
+------------+--------------+--------------+--------------+
|        101 |      0.00317 |        0.061 |        0.024 |
|        102 |      0.00629 |        0.087 |        0.013 |
|        103 |      0.00242 |        0.055 |        0.091 |
|        104 |      0.00253 |        0.027 |        0.047 |
|        105 |      0.00421 |        0.022 |        0.071 |
|        106 |      0.00414 |        0.094 |        0.077 |
|        107 |      0.00739 |        0.099 |        0.067 |
|        108 |      0.00549 |        0.072 |        0.046 |
|        109 |      0.00658 |        0.018 |        0.077 |
|        110 |      0.00852 |        0.052 |        0.044 |
+------------+--------------+--------------+--------------+

I started by combining these elements into a single array for each customer:

# combine all values into 1 array
list_to_combine = ['prob_CHOICEA', 'prob_CHOICEB','prob_CHOICEC']

df['probs_A_B_C']= df[list_to_combine].values.tolist()

df.drop(list_to_combine, axis=1, inplace=True)
+------------+-------------------------+
| customerid |       probs_A_B_C       |
+------------+-------------------------+
|        101 | [0.00317, 0.061, 0.024] |
|        102 | [0.00629, 0.087, 0.013] |
|        103 | [0.00242, 0.055, 0.091] |
|        104 | [0.00253, 0.027, 0.047] |
|        105 | [0.00421, 0.022, 0.071] |
|        106 | [0.00414, 0.094, 0.077] |
|        107 | [0.00739, 0.099, 0.067] |
|        108 | [0.00549, 0.072, 0.046] |
|        109 | [0.00658, 0.018, 0.077] |
|        110 | [0.00852, 0.052, 0.044] |
+------------+-------------------------+

For each customer, I only have 4 choices of what to do:

choices = [
    [0,0,0],
    [1,0,0],
    [0,1,0],
    [0,0,1]
]

For each customer, I want to choose the best choice for each customer. At first glance this was easy - just pick the highest number. However it starts to blow my mind once I start adding constraints.

For example, what if I want to choose the best choice for each customer, but with the constraint that the sum of selected choices is = 5

+------------+-------------------------+-------------+
| customerid |       probs_A_B_C       | best_choice |
+------------+-------------------------+-------------+
|        101 | [0.00317, 0.061, 0.024] | [0,0,0]     |
|        102 | [0.00629, 0.087, 0.013] | [0,1,0]     |
|        103 | [0.00242, 0.055, 0.091] | [0,0,1]     |
|        104 | [0.00253, 0.027, 0.047] | [0,0,0]     |
|        105 | [0.00421, 0.022, 0.071] | [0,0,0]     |
|        106 | [0.00414, 0.094, 0.077] | [0,1,0]     |
|        107 | [0.00739, 0.099, 0.067] | [0,1,0]     |
|        108 | [0.00549, 0.072, 0.046] | [0,0,0]     |
|        109 | [0.00658, 0.018, 0.077] | [0,0,1]     |
|        110 | [0.00852, 0.052, 0.044] | [0,0,0]     |
+------------+-------------------------+-------------+

I did not even figure out how to do this, I just eye-balled it manually for illustrative purposes.

Ideally, I'd like to add multiple constraints simultaneously:

  • Total sum of best_choice = N
  • Total sum of CHOICEA (first element of best_choice) >= M
  • Total sum of CHOICEB (second element of best_choice) <= 10

Any ideas of where to start?

Straphanger answered 9/4, 2020 at 16:35 Comment(1)
(Mixed-)Integer-Programming is a very natual fit here and even the available solvers in the open-source world are so advanced (e.g. CoinOR Cbc), that they are very hard to beat with custom-algorithms. A broader classification is discrete-optimization / combinatorial-optimization where there are other general-purpose tools available like Constraint-programming and SAT-solving (which are less suited for optimizing some objective, although often supported; but very good at searching comb search-spaces)Bucentaur
T
5

You can use scipy.optimize.linprog to solve this linear optimization problem. It requires to setup the boundary conditions as matrix products, as outlined in the docs. There are two types of boundary conditions, inequalities of the form A @ x <= b and equality A @ x == b. The problem can be modeled as follows:

  • The resulting vector x has length N*C where N is the number of customers and C is the number of options; it represents the choices per custom in a linear layout: [c1_A, c1_B, c1_C, c2_A, c2_B, c2_C, ..., cN_A, cN_B, cN_C].
  • Since each customer can make at most one choice we have an inequality for each customer that sums all the corresponding choices, i.e. a matrix where the rows represent the customers and columns represent all choices. The matrix has entries 1 if a choice corresponds to the customer and zero otherwise (illustration see below).
  • Option A must be selected at minimum M times; since we only have inequalities of the form A @ x <= b we can invert the values and use -1 entries in A that correspond to option A and -M in b.
  • Option B must be selected no more that 10 times; this can be modeled similar to the previous constraint by using entries of 1 and positive 10 (since it is already of the form <=).
  • The sum of all choices must be N. This can be modeled by an equality constraint where the matrix sums over all choices in x and the result must be equal to N.

This is an illustration of the above constraints:

# Max. one choice per customer.
# A =                                           # b =
[[1, 1, 1,   0, 0, 0,   ...,   0, 0, 0],        [1,
 [0, 0, 0,   1, 1, 1,   ...,   0, 0, 0],         1,
 ...                                             ...
 [0, 0, 0,   0, 0, 0,   ...,   1, 1, 1]]         1]

# Min. M choices for option A.
# A =                                              # b =
[[-1, 0, 0,   -1, 0, 0,   ...,   -1, 0, 0]]        [[-M]]

# Max. 10 choices for option B.
# A =                                           # b =
[[0, 1, 0,   0, 1, 0,   ...,   0, 1, 0]]        [[10]]

# Total number of choices equals N.
# A =                                           # b = 
[[1, 1, 1,   1, 1, 1,   ...,   1, 1, 1]]        [[N]]

Here's some sample code to setup the constraints and run the optimization:

import numpy as np
import pandas as pd
from scipy.optimize import linprog

data = {'customerid':[101,102,103,104,105,106,107,108,109,110],
        'prob_CHOICEA':[0.00317,0.00629,0.00242,0.00253,0.00421,0.00414,0.00739,0.00549,0.00658,0.00852],
        'prob_CHOICEB':[0.061,0.087,0.055,0.027,0.022,0.094,0.099,0.072,0.018,0.052],
        'prob_CHOICEC':[0.024,0.013,0.091,0.047,0.071,0.077,0.067,0.046,0.077,0.044]
       } 

# Creates pandas DataFrame 
df = pd.DataFrame(data) 
df = df.reset_index(drop=True).set_index(['customerid'])
print(df, end='\n\n')

nc = df.shape[1]  # number of options

data = df.to_numpy().ravel()

# Max. choices per customer is 1.
A_ub_1 = np.zeros((len(df), len(data)))
for i in range(len(A_ub_1)):
    A_ub_1[i, nc*i:nc*(i+1)] = 1
b_ub_1 = np.ones(len(df))

# Min. choices for option A is 3.
A_ub_2 = np.zeros((1, len(data)))
A_ub_2[0, ::nc] = -1  # invert, since this defines an upper boundary
b_ub_2 = np.array([-3])

# Max. choices for option B is 2.
A_ub_3 = np.zeros((1, len(data)))
A_ub_3[0, 1::nc] = 1
b_ub_3 = np.array([2])

# Total sum of choices is 7.
A_eq = np.ones((1, len(data)))
b_eq = np.array([7])

result = linprog(
    -1 * data,  # linprog aims to minimize the value
    A_eq=A_eq, b_eq=b_eq,
    A_ub=np.concatenate((A_ub_1, A_ub_2, A_ub_3), axis=0),
    b_ub=np.concatenate((b_ub_1, b_ub_2, b_ub_3), axis=0),
    bounds=(0, 1)
)
print(result, end='\n\n')

choices = (result.x.reshape(-1, 3) > 1e-6).astype(int)
print('Choices:', choices, sep='\n')

It produces the following results:

            prob_CHOICEA  prob_CHOICEB  prob_CHOICEC
customerid                                          
101              0.00317         0.061         0.024
102              0.00629         0.087         0.013
103              0.00242         0.055         0.091
104              0.00253         0.027         0.047
105              0.00421         0.022         0.071
106              0.00414         0.094         0.077
107              0.00739         0.099         0.067
108              0.00549         0.072         0.046
109              0.00658         0.018         0.077
110              0.00852         0.052         0.044

     con: array([-1.30002675e-11])
     fun: -0.3812999999903971
 message: 'Optimization terminated successfully.'
     nit: 7
   slack: array([1.00000000e+00, 7.99305067e-11, 1.47325485e-11, 1.00000000e+00,
       1.00000000e+00, 2.49527066e-11, 2.42738052e-11, 5.84235438e-10,
       4.23596713e-11, 5.77714543e-11, 8.80984175e-12, 1.46305190e-11])
  status: 0
 success: True
       x: array([2.89971936e-10, 1.32732722e-11, 6.97732845e-12, 1.00000000e+00,
       3.28055311e-10, 5.72702383e-12, 1.80418885e-11, 4.61391860e-12,
       1.00000000e+00, 2.01674011e-10, 4.58311340e-12, 1.29599793e-11,
       2.95298295e-10, 4.34109315e-12, 1.21776975e-11, 3.39951283e-11,
       1.00000000e+00, 2.55262044e-10, 4.94703751e-11, 1.00000000e+00,
       1.57932544e-11, 9.99999999e-01, 2.21487598e-11, 1.33679145e-11,
       2.30514296e-10, 3.91129933e-12, 1.00000000e+00, 1.00000000e+00,
       8.19015577e-12, 1.07293976e-11])

Choices:
[[0 0 0]
 [1 0 0]
 [0 0 1]
 [0 0 0]
 [0 0 0]
 [0 1 0]
 [0 1 0]
 [1 0 0]
 [0 0 1]
 [1 0 0]]
Tolan answered 14/4, 2020 at 20:43 Comment(0)
K
5

This problem may be solved using Linear Programming (LP), yet the most difficult part isn't knowing you should use LP, it is to transform your problem into an LP-optimization problem and I will show you how to do just that. Before proceeding, I will change the example data you provided for simplification purposes (because of the huge amount of generated variables), therefore, assume we have the following input data:

+------------+---------------------+
| customerid |  prob A  |  prob B  |
+------------+---------------------+
|        101 |  0.00317 |   0.061  |
|        102 |  0.00629 |   0.087  |
+------------+---------------------+

Assume the size of the input problem is N, where N represents the number of choices:

+---------------------+
|  prob A  |  prob B  |
+---------------------+
|  0.00317 |   0.061  |
|  0.00629 |   0.087  |
+------------+--------+

Since we have 4 different choices, N=4 (it doesn't matter that some of them are mutually exclusive, such characteristics will be mapped by the constraints). In LP we have following things to deal with:

  • An objective function C (dimensions 1x[at least N], it's a row-array),
  • A matrix A of constraints (dimensions depend on how many restrictions you want to add, you may also have more restrictions than variables) and
  • The right-hand side (which we will call b, it's dimensions are [number of rows in A]x1, it's a column-array).

Accordingly, an LP maximization problem will have the following format:

Max Cx

subject to:
    Ax <= b
    x >= 0

Note that from this point forward, we will create LP variables to represent the input data we have, therefore assume the following mapping between xi and input data:

+-------------------------------+
| LP variable | Customer | prob |
+-------------------------------+
|     x0      |    101   |   A  |
|     x1      |    101   |   B  |
|     x2      |    102   |   A  |
|     x3      |    102   |   B  |
+-------------------------------+

Let's start by filling our constraints matrix A and the RHS b in parallel, we should create a matrix formed by the concatenation of the columns of two NxN identity matrices:

                           A               
      +-----------------------------------------------------+ 
      |  x0  |  x1  |  x2  |  x3  |  x4  |  x5 |  x6  |  x7 |       b
      +-----------------------------------------------------+    +-----+
row 0 |   1  |   0  |   0  |   0  |   1  |  0  |   0  |  0  |    |  1  |
row 1 |   0  |   1  |   0  |   0  |   0  |  1  |   0  |  0  |    |  1  |
row 2 |   0  |   0  |   1  |   0  |   0  |  0  |   1  |  0  |    |  1  |
row 3 |   0  |   0  |   0  |   1  |   0  |  0  |   0  |  1  |    |  1  |
      +-----------------------------------------------------+    +-----+

We also need to ensure that at most one variable is selected per customer (row of our input data), so we also create one extra variable per customer, in this case, x8 and x9, and set them to 1 on the respective new 2 rows we will create on A. In addition, the new rows must also have 1s in the variables mapping to each customer (simply take a look at which variables are present in the desired customer). So we add the following 2 rows to matrix A and column array b:

                                  A
      +------------------------------------------------------------------+
      |  x0  |  x1  |  x2  |  x3  |  x4  |  x5 |  x6  |  x7 |  x8  | x9  |       b
      +------------------------------------------------------------------+    +-----+
row 4 |   1  |   1  |   0  |   0  |   0  |  0  |   0  |  0  |   1  |  0  |    |  1  |
row 5 |   0  |   0  |   1  |   1  |   0  |  0  |   0  |  0  |   0  |  1  |    |  1  |
      +------------------------------------------------------------------+    +-----+

Now A becomes:

                                  A
      +------------------------------------------------------------------+
      |  x0  |  x1  |  x2  |  x3  |  x4  |  x5 |  x6  |  x7 |  x8  | x9  |       b
      +------------------------------------------------------------------+    +-----+
row 0 |   1  |   0  |   0  |   0  |   1  |  0  |   0  |  0  |   0  |  0  |    |  1  |
row 1 |   0  |   1  |   0  |   0  |   0  |  1  |   0  |  0  |   0  |  0  |    |  1  |
row 2 |   0  |   0  |   1  |   0  |   0  |  0  |   1  |  0  |   0  |  0  |    |  1  |
row 3 |   0  |   0  |   0  |   1  |   0  |  0  |   0  |  1  |   0  |  0  |    |  1  |
row 4 |   1  |   1  |   0  |   0  |   0  |  0  |   0  |  0  |   1  |  0  |    |  1  |
row 5 |   0  |   0  |   1  |   1  |   0  |  0  |   0  |  0  |   0  |  1  |    |  1  |
      +------------------------------------------------------------------+    +-----+

Suppose we also want to add a constraint to ensure that at most 2 choices of probs are made in total, then we add row 6 and column x10 to A, setting to 1 variables from x0 to x3 and also x10:

                                  A
      +------------------------------------------------------------------------+
      |  x0  |  x1  |  x2  |  x3  |  x4  |  x5 |  x6  |  x7 |  x8  | x9  | x10 |         b
      +------------------------------------------------------------------------+      +-----+
row 0 |   1  |   0  |   0  |   0  |   1  |  0  |   0  |  0  |   0  |  0  |  0  |      |  1  |
row 1 |   0  |   1  |   0  |   0  |   0  |  1  |   0  |  0  |   0  |  0  |  0  |      |  1  |
row 2 |   0  |   0  |   1  |   0  |   0  |  0  |   1  |  0  |   0  |  0  |  0  |      |  1  |
row 3 |   0  |   0  |   0  |   1  |   0  |  0  |   0  |  1  |   0  |  0  |  0  |      |  1  |
row 4 |   1  |   1  |   0  |   0  |   0  |  0  |   0  |  0  |   1  |  0  |  0  |      |  1  |
row 5 |   0  |   0  |   1  |   1  |   0  |  0  |   0  |  0  |   0  |  1  |  0  |      |  1  |
row 6 |   1  |   1  |   1  |   1  |   0  |  0  |   0  |  0  |   0  |  0  |  1  |      |  2  |
      +------------------------------------------------------------------------+      +-----+

Note that in this simple example, restricting the number of choices to at most 2, doesn't impact the final result.

Finally we build the objective function:

                               C
+-----------------------------------------------------------------------------------+
|    x0    |    x1   |    x2   |    x3   |  x4 |  x5 | x6  |  x7 |  x8 |  x9 |  x10 |
+-----------------------------------------------------------------------------------+
|  0.00317 |   0.061 | 0.00629 |   0.087 |  0  |  0  |  0  |  0  |  0  |  0  |   0  |
+-----------------------------------------------------------------------------------+

The variables which were created but don't have a mapping to the customer's input data are called slack variables and their purpose is to structure correctly the math behind the LP problem.

Now that you know how to model your problem as a LP optimization problem, you must choose a method to solve the problem. I recommend the simplex algorithm, which you may find at scipy.

After running your preferred solver, you must interpret the output result. The result should be an array of a single row containing the value of each xi. The output of the above example I gave, would be:

+------------------------------------------------------------------+
|  x0 |  x1 |  x2 |  x3 |  x4 |  x5 | x6  |  x7 |  x8 |  x9 |  x10 |
+------------------------------------------------------------------+
|  0  |  1  |  0  |  1  |  0  |  0  |  0  |  0  |  0  |  0  |   0  |
+------------------------------------------------------------------+

The above result means you should pick the element represented by variables x1 and x3 since they are set to 1, i. e., customer 101 selects prob B and customer 102 selects prob B as well.

Post Scriptum:

  • Once using scipy.optimze.linprog lib to get the job done, just make sure you use the parameter "Aeq" instead of "Aub" for the constraints if you use the above modeling;
  • I didn't dive deep into the math behind this specific problem to prove this, however, it seems that integer-LP will never be a must due to the nature of the constraints that can be built from this problem;
  • The coefficients from the objective function C may assume any real value, including negative and 0; and
  • I have suggested scipy's LP tool because I have worked with it before and works like a charm, nonetheless, there are other free awesome implementations available like glpk which might provide more advanced tools for any further needs in your problem.
Klingel answered 14/4, 2020 at 21:42 Comment(1)
This is really incredible. Your explanation about how to setup this problem as an LP problem step-by-step, is invaluable.Straphanger

© 2022 - 2024 — McMap. All rights reserved.