I've trained an elastic net model in R using glmnet and would like to use it to make predictions off of a new data set.
But I'm having trouble producing the matrix to use as an argument in the predict() method because some of my factor variables (dummy variables indicating the presence of comorbidities) in the new data set only have one level (the comorbidities were never observed), which means I can't use
model.matrix(RESPONSE ~ ., new_data)
because it gives me the (expected)
Error in
contrasts<-
(*tmp*
, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels
I'm at a loss for how to get around this issue. Is there a way in R that I can construct an appropriate matrix for use in predict() in this situation, or do I need to prepare the matrix outside of R? In either case, how might I go about doing it?
Here is a toy example that reproduces the issue I'm having:
x1 <- rnorm(100)
x2 <- as.factor(rbinom(100, 1, 0.6))
x3 <- as.factor(rbinom(100, 1, 0.4))
y <- rbinom(100, 1, 0.2)
toy_data <- data.frame(x1, x2, x3, y)
colnames(toy_data) = c("Continuous", "FactorA", "FactorB", "Outcome")
mat1 <- model.matrix(Outcome ~ ., toy_data)[,-1]
y1 <- toy_data$Outcome
new_data <- toy_data
new_data$FactorB <- as.factor(0)
#summary(new_data) # Just to verify that FactorB now only contains one level
mat2 <- model.matrix(Outcome ~ ., new_data)[,-1]
levels(new_data$FactorB) <- levels(toy_data$FactorB)
– Outfoot