I have a logistic regression model in R, where all of the predictor variables are categorical rather than continuous.
If all your covariates are factors (not including the intercept), this is fairly easy as the model matrix only contains 0 and 1 and the number of 1 indicates the occurrence of that factor level (or interaction level) in your data. So just do colSums(model.matrix(your_glm_model_object))
.
Since a model matrix has column names, colSums
will give you a vector with "names" attribute, that is consistent with the "names" field of coef(your_glm_model_object)
.
The same solution applies to a linear model (by lm
) and a generalized linear model (by glm
) for any distribution family.
Here is a quick example:
set.seed(0)
f1 <- sample(gl(2, 50)) ## a factor with 2 levels, each with 50 observations
f2 <- sample(gl(4, 25)) ## a factor with 4 levels, each with 25 observations
y <- rnorm(100)
fit <- glm(y ~ f1 * f2) ## or use `lm` as we use `guassian()` family object here
colSums(model.matrix(fit))
#(Intercept) f12 f22 f23 f24 f12:f22
# 100 50 25 25 25 12
# f12:f23 f12:f24
# 12 14
Here, we have 100 observations / complete-cases (indicated under (Intercept)
).
Is there a way to display the count for the baseline level of each factor?
Baseline levels are contrasted, so they don't appear in the the model matrix used for fitting. However, we can generate the full model matrix (without contrasts) from your formula not your fitted model (this also offers you a way to drop numeric variables if you have them in your model):
SET_CONTRAST <- list(f1 = contr.treatment(nlevels(f1), contrast = FALSE),
f2 = contr.treatment(nlevels(f2), contrast = FALSE))
X <- model.matrix(~ f1 * f2, contrasts.arg = SET_CONTRAST)
colSums(X)
#(Intercept) f11 f12 f21 f22 f23
# 100 50 50 25 25 25
# f24 f11:f21 f12:f21 f11:f22 f12:f22 f11:f23
# 25 13 12 13 12 13
# f12:f23 f11:f24 f12:f24
# 12 11 14
Note that it can quickly become tedious in setting contrasts when you have many factor variables.
model.matrix
is definitely not the only approach for this. The conventional way may be
table(f1)
table(f2)
table(f1, f2)
but could get tedious too when your model become complicated.