lm: 1 coefficient not defined because of singularities while no major collinearity issues

I have seen similar posts like this one which say that getting the error message saying: Coefficients: (1 not defined because of singularities) is because of nearly perfect correlation among predictors used in the lm() call.

But in my case, there is no nearly perfect correlation among predictors, but still one coefficient (X_wthn_outcome) returns NA in the output of lm().

I wonder what is wrong with the coefficient that returns NA?

For reproducibility purposes, the exact same data and code is provided below.

library(dplyr)

set.seed(132)
(data <- expand.grid(study = 1:1e3, outcome = rep(1:50,2)))
data$X <- rnorm(nrow(data))
e <- rnorm(nrow(data), 0, 2)
data$yi <- .8 +.6*data$X + e

dat <- data %>% 
  group_by(study) %>% 
  mutate(X_btw_study = mean(X), X_wthn_study = X-X_btw_study) %>%
  group_by(outcome, .add = TRUE) %>%
  mutate(X_btw_outcome = mean(X), X_wthn_outcome = X-X_btw_outcome) %>% ungroup()
  
round(cor(select(dat,-study,-outcome,-X,-yi)),3)

#               X_btw_study X_wthn_study X_btw_outcome X_wthn_outcome
#X_btw_study          1.000        0.000         0.141           0.00
#X_wthn_study         0.000        1.000         0.698           0.71
#X_btw_outcome        0.141        0.698         1.000           0.00
#X_wthn_outcome       0.000        0.710         0.000           1.00

summary(lm(yi ~ 0 + X_btw_study + X_btw_outcome + X_wthn_study
   + X_wthn_outcome, data = dat))

#Coefficients: (1 not defined because of singularities)
#               Estimate Std. Error t value Pr(>|t|)    
#X_btw_study    0.524093   0.069610   7.529 5.15e-14 ***
#X_btw_outcome  0.014557   0.013694   1.063    0.288    
#X_wthn_study   0.589517   0.009649  61.096  < 2e-16 ***
#X_wthn_outcome       NA         NA      NA       NA  ## What's wrong with this variable

lm(X_wthn_outcome ~ X_btw_study + X_btw_outcome + X_wthn_study , data = dat) #------------------ Call: lm(formula = X_wthn_outcome ~ X_btw_study + X_btw_outcome + X_wthn_study, data = dat) Coefficients: (Intercept) X_btw_study X_btw_outcome X_wthn_study 1.165e-17 1.000e+00 -1.000e+00 1.000e+00 #-------------- summary( lm(X_wthn_outcome ~ X_btw_study + X_btw_outcome + X_wthn_study , data = dat) ) #--------------- Call: lm(formula = X_wthn_outcome ~ X_btw_study + X_btw_outcome + X_wthn_study, data = dat) Residuals: Min 1Q Median 3Q Max -3.901e-14 -6.000e-17 0.000e+00 5.000e-17 3.195e-13 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.165e-17 3.242e-18 3.594e+00 0.000326 *** X_btw_study 1.000e+00 3.312e-17 3.020e+16 < 2e-16 *** X_btw_outcome -1.000e+00 6.515e-18 -1.535e+17 < 2e-16 *** X_wthn_study 1.000e+00 4.590e-18 2.178e+17 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.025e-15 on 99996 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 1.582e+34 on 3 and 99996 DF, p-value: < 2.2e-16

Recommended topics

Hot tags