How to add "greater than 0 and sums to 1" constraint to a regression in Python?
Asked Answered
T

2

11

I am using statsmodels (open to other python options) to run some linear regression. My problem is that I need the regression to have no intercept and constraint the coefficients in the range (0,1) and also sum to 1.

I tried something like this (for the sum of 1, at least):

from statsmodels.formula.api import glm
import pandas as pd

df = pd.DataFrame({'revised_guess':[0.6], "self":[0.55], "alter_1":[0.45], "alter_2":[0.2],"alter_3":[0.8]})
mod = glm("revised_guess ~ self + alter_1 + alter_2 + alter_3 - 1", data=df)
res = mod.fit_constrained(["self + alter_1 + alter_2 + alter_3  = 1"],
                          start_params=[0.25,0.25,0.25,0.25])
res.summary()

but still struggling to enforce the 'non-negative' coefficients constraint.

Taxexempt answered 26/2, 2019 at 23:39 Comment(5)
Looks like that your problem fails into linear programming model. I am not sure that statsmodels supports that.Bivalent
I believe you may be looking for sklearn.linear_model.LinearRegressionHarlotry
Please help me understand - How can you make a negative coefficient positive? If an x has a negative relationship with your y, what do you mean by constraining its coefficient into the (0,1) range? How can you revert a negative relationship to a positive one?Ytterbia
@Ytterbia just as you didn't get a response to your question. Forcing the coefficient to be positive makes sense in certain contexts where you are looking to find the optimal combination of inputs and negative weights are infeasible. E.g. I want to find the optimal weight to give to the effort of each team member as a function of their skills, I cannot place a negative weight on someone. Your doubt makes sense if you only consider estimating an empirical relationship, e.g. the correlation between rainfall and umbrella use, but regression analysis can be used for a wealth other reasons.Sausage
Possibly a duplicate : https://mcmap.net/q/751487/-how-to-include-constraint-to-scipy-nnls-function-solution-so-that-it-sums-to-1/6151828Affrica
H
4

You could NNLS(Non-Negative Least Squares) which is defined under scipy. It based on FORTRAN non negative least square solver. You cant add constraints to it. So add another equation such that x1+x2+x3=1 to the input equations.

import numpy as np
from scipy.optimize import nnls 
##Define the input vectors
A = np.array([[1., 2., 5.], 
              [5., 6., 4.],
              [1.,  1.,   1. ]])

b = np.array([4., 7., 2.])

##Caluculate nnls
x, resdiual_norm = nnls(A,b)


##Find the difference
print(np.sum(A*x,1)-b)

Now perform NNLS over this matrix, it will return the x values and the residuals.

Hawks answered 14/3, 2019 at 7:39 Comment(0)
K
-2

Simply do the L1 regularized regression:

import statsmodels.api as sm
from statsmodels.regression.linear_model import OLS
model = sm.OLS(Y,X)
model2=model.fit_regularized(method='elastic_net', alpha=0.0, L1_wt=1.0, start_params=None, profile_scale=False, refit=False)
model2.params

... and tune hyperparameters.

Kape answered 11/3, 2019 at 16:3 Comment(1)
How can you make a negative coefficient positive though? If an x has a negative relationship with y, what does it mean to constrain its coefficient into the (0,1) range? How can one revert a negative relationship to a positive one?Ytterbia

© 2022 - 2024 — McMap. All rights reserved.