LASSO with $\lambda = 0$ and OLS produce different results in R glmnet

Asked 14/7, 2016 at 6:30 Answered 23/2, 2018 at 9:40

I expect LASSO with no penalization ($\lambda=0$) to yield the same (or very similar) coefficient estimates as an OLS fit. However, I get different coefficient estimates in R putting the same data (x,y) into

glmnet(x, y , alpha=1, lambda=0) for LASSO fit with no penalization and
lm(y ~ x) for OLS fit.

Why is that?

Citadel answered 14/7, 2016 at 6:30 Comment(5)

Instead of focusing on particular function in R, you would be better off explaining why you think the two fits should be very similar. E.g. say that LASSO with no penalization should give nothing else but an OLS fit, if that is what you mean. You can also elaborate on why you think so using formulas. – Lucialucian 14/7, 2016 at 6:46

I thought it's quite obvious LASSO with no penalization and OLS should give the same results. I was wondering why two algorithms give me different estimates. – Citadel 14/7, 2016 at 10:32

What is obvious for you might not be obvious for everyone else, so just in case it is best to be as explicit and as precise as you can. – Lucialucian 14/7, 2016 at 10:59

Sure! I hope the problem is clear now. – Citadel 14/7, 2016 at 11:3

I am sure it is a software problem, If you solve the problem by SVD manually, you will get identical results. I tried the same thing. – Survance 14/7, 2016 at 14:43

You're using the function wrong. The x should be the model matrix. Not the raw predictor value. When you do that, you get the exact same results:

x <- rnorm(500)
y <- rnorm(500)
mod1 <- lm(y ~ x) 

xmm <- model.matrix(mod1)
mod2 <- glmnet(xmm, y, alpha=1, lambda=0)

coef(mod1)
coef(mod2)

Ramsdell answered 14/7, 2016 at 15:6 Comment(2)

But glmnet has intercept=TRUE by default, so it already adds an intercept term, right? So not clear to me why this should be necessary since you xmm is just cbind(1,x).... – Whitneywhitson 13/8, 2019 at 12:2

You just got lucky in the special case that your X are not correlated... in the general case, coefficients will not be the same... – Bromberg 15/3, 2022 at 21:11

I had the same problem, asked around to no avail, then I emailed the package maintainer (Trevor Hastie) who gave the answer. The problem occurs when series are highly correlated. The solution is to decrease the threshold in the glmnet() function call (rather than via glmnet.control()). The code below uses the built-in dataset EuStockMarkets and applies a VAR with lambda=0. For XSMI, the OLS coefficient is below 1, the default glmnet coefficient is above 1 with a difference of about 0.03, and the glmnet coefficient with thresh=1e-14 is very close to the OLS coefficient (a difference of 1.8e-7).

# Use built-in panel data with integrated series
data("EuStockMarkets")
selected_market <- 2

# Take logs for good measure
EuStockMarkets <- log(EuStockMarkets)

# Get dimensions
num_entities <- dim(EuStockMarkets)[2]
num_observations <- dim(EuStockMarkets)[1]

# Build the response with the most recent observations at the top
Y <- as.matrix(EuStockMarkets[num_observations:2, selected_market])
X <- as.matrix(EuStockMarkets[(num_observations - 1):1, ])

# Run OLS, which adds an intercept by default
ols <- lm(Y ~ X)
ols_coef <- coef(ols)

# run glmnet with lambda = 0
fit <- glmnet(y = Y, x = X, lambda = 0)
lasso_coef <- coef(fit)

# run again, but with a stricter threshold
fit_threshold <- glmnet(y = Y, x = X, lambda = 0, thresh = 1e-14)
lasso_threshold_coef <- coef(fit_threshold)

# build a dataframe to compare the two approaches
comparison <- data.frame(ols = ols_coef,
                         lasso = lasso_coef[1:length(lasso_coef)],
                         lasso_threshold = lasso_threshold_coef[1:length(lasso_threshold_coef)]
)
comparison$difference <- comparison$ols - comparison$lasso
comparison$difference_threshold <- comparison$ols - comparison$lasso_threshold

# Show the two values for the autoregressive parameter and their difference
comparison[1 + selected_market, ]

R returns:

           ols    lasso lasso_threshold  difference difference_threshold
XSMI 0.9951249 1.022945       0.9951248 -0.02782045         1.796699e-07

Anthracene answered 23/2, 2018 at 9:40 Comment(2)

I had the same problem, and setting a lower threshold did not solve it. – Toupee 4/8, 2020 at 19:46

@Toupee What threshold value did you use? I seldom do LASSO these days so I suggest emailing the package maintainer again as he's very responsive. – Anthracene 5/8, 2020 at 15:40

I have run with the "prostate" example dataset of Hastie's book the next code:

out.lin1 = lm( lpsa ~ . , data=yy ) 
out.lin1$coeff             
out.lin2 = glmnet( as.matrix(yy[ , -9]), yy$lpsa, family="gaussian", lambda=0, standardize=T  ) 
coefficients(out.lin2)

and the result of the coefficients are similar. When we use the standardize option the returned coefficients by glmnet() are in the original units of the input variables. Please, check you are using the "gaussian" family

Ailing answered 14/7, 2016 at 8:17 Comment(1)

Adding family = "gaussian" did not change the outcome – Citadel 14/7, 2016 at 10:43

From glmnet help: Note also that for "gaussian", glmnet standardizes y to have unit variance before computing its lambda sequence (and then unstandardizes the resulting coefficients); if you wish to repro- duce/compare results with other software, best to supply a standardized y.

Rhoades answered 14/7, 2016 at 7:15 Comment(2)

The difference between the lm and glmnet coefficients get smaller, as the absolute value of the coefficients gets smaller. I still get the same differences when I unstandardize the coefficients. – Citadel 14/7, 2016 at 11:2

There is another Warning in the help file, specifically the description of the lambda parameter, saying that the algorithm may have problems if one only provides a scalar and not a vector. I am not sure if this only causes a speed problem or actually may bias the estimates. – Rhoades 14/7, 2016 at 13:51

Recommended topics

Hot tags