Linear regression with constraints on the coefficients
Asked Answered
B

1

6

I am trying to perform linear regression, for a model like this:

Y = aX1 + bX2 + c

So, Y ~ X1 + X2

Suppose I have the following response vector:

set.seed(1)
Y <- runif(100, -1.0, 1.0)

And the following matrix of predictors:

X1 <- runif(100, 0.4, 1.0)
X2 <- sample(rep(0:1,each=50))
X <- cbind(X1, X2)

I want to use the following constraints on the coefficients:

a + c >= 0  
c >= 0

So no constraint on b.

I know that the glmc package can be used to apply constraints, but I was not able to determine how to apply it for my constraints. I also know that contr.sum can be used so that all coefficients sum to 0, for example, but that is not what I want to do. solve.QP() seems like another possibility, where setting meq=0 can be used so that all coefficients are >=0 (again, not my goal here).

Note: The solution must be able to handle NA values in the response vector Y, for example with:

Y <- runif(100, -1.0, 1.0)
Y[c(2,5,17,56,37,56,34,78)] <- NA
Bitter answered 8/8, 2017 at 20:35 Comment(0)
D
4

solve.QP can be passed arbitrary linear constraints, so it can certainly be used to model your constraints a+c >= 0 and c >= 0.

First, we can add a column of 1's to X to capture the intercept term, and then we can replicate standard linear regression with solve.QP:

X2 <- cbind(X, 1)
library(quadprog)
solve.QP(t(X2) %*% X2, t(Y) %*% X2, matrix(0, 3, 0), c())$solution
# [1]  0.08614041  0.21433372 -0.13267403

With the sample data from the question, neither constraint is met using standard linear regression.

By modifying both the Amat and bvec parameters, we can add our two constraints:

solve.QP(t(X2) %*% X2, t(Y) %*% X2, cbind(c(1, 0, 1), c(0, 0, 1)), c(0, 0))$solution
# [1] 0.0000000 0.1422207 0.0000000

Subject to these constraints, the squared residuals are minimized by setting the a and c coefficients to both equal 0.

You can handle missing values in Y or X2 just as the lm function does, by removing the offending observations. You might do something like the following as a pre-processing step:

has.missing <- rowSums(is.na(cbind(Y, X2))) > 0
Y <- Y[!has.missing]
X2 <- X2[!has.missing,]
Doud answered 8/8, 2017 at 21:57 Comment(3)
Thank you for this answer! Just to make sure I understand properly, since I want a+c >= 0 and c >= 0, cases where these constraints are met but where a and c are not equal to 0 should be not be constrained, they should be left as is (results of standard linear regression). Would your solution be applicable for different data where the coefficients could potentially meet the constraints? I want to be able to apply this without knowing beforehand if the constraints are met using standard linear regression (so that I can use it on large datasets).Bitter
In addition, how can we deal with potential NAs in the response when using this method (I am used to lm()), and how can we get standard errors or p-values for the coefficients?Bitter
Yes, if the constraints are not binding in the original linear regression then you will get back those results. The constraints only change things if they are not met in standard linear regression. I don't know the answer to your questions regarding p-values; you might be able to get help on those at stats.stackexchange.com.Doud

© 2022 - 2024 — McMap. All rights reserved.