I have seen similar posts like this one which say that getting the error message saying: Coefficients: (1 not defined because of singularities)
is because of nearly perfect correlation among predictors used in the lm()
call.
But in my case, there is no nearly perfect correlation among predictors, but still one coefficient (X_wthn_outcome
) returns NA
in the output of lm()
.
I wonder what is wrong with the coefficient that returns NA
?
For reproducibility purposes, the exact same data and code is provided below.
library(dplyr)
set.seed(132)
(data <- expand.grid(study = 1:1e3, outcome = rep(1:50,2)))
data$X <- rnorm(nrow(data))
e <- rnorm(nrow(data), 0, 2)
data$yi <- .8 +.6*data$X + e
dat <- data %>%
group_by(study) %>%
mutate(X_btw_study = mean(X), X_wthn_study = X-X_btw_study) %>%
group_by(outcome, .add = TRUE) %>%
mutate(X_btw_outcome = mean(X), X_wthn_outcome = X-X_btw_outcome) %>% ungroup()
round(cor(select(dat,-study,-outcome,-X,-yi)),3)
# X_btw_study X_wthn_study X_btw_outcome X_wthn_outcome
#X_btw_study 1.000 0.000 0.141 0.00
#X_wthn_study 0.000 1.000 0.698 0.71
#X_btw_outcome 0.141 0.698 1.000 0.00
#X_wthn_outcome 0.000 0.710 0.000 1.00
summary(lm(yi ~ 0 + X_btw_study + X_btw_outcome + X_wthn_study
+ X_wthn_outcome, data = dat))
#Coefficients: (1 not defined because of singularities)
# Estimate Std. Error t value Pr(>|t|)
#X_btw_study 0.524093 0.069610 7.529 5.15e-14 ***
#X_btw_outcome 0.014557 0.013694 1.063 0.288
#X_wthn_study 0.589517 0.009649 61.096 < 2e-16 ***
#X_wthn_outcome NA NA NA NA ## What's wrong with this variable