This R code throws a warning
# Fit regression model to each cluster
y <- list()
length(y) <- k
vars <- list()
length(vars) <- k
f <- list()
length(f) <- k
for (i in 1:k) {
vars[[i]] <- names(corc[[i]][corc[[i]]!= "1"])
f[[i]] <- as.formula(paste("Death ~", paste(vars[[i]], collapse= "+")))
y[[i]] <- lm(f[[i]], data=C1[[i]]) #training set
C1[[i]] <- cbind(C1[[i]], fitted(y[[i]]))
C2[[i]] <- cbind(C2[[i]], predict(y[[i]], C2[[i]])) #test set
}
I have a training data set (C1) and a test data set (C2). Each one has 129 variables. I did k means cluster analysis on the C1 and then split my data set based on cluster membership and created a list of different clusters (C1[[1]], C1[[2]], ..., C1[[k]]). I also assigned a cluster membership to each case in C2 and created C2[[1]],..., C2[[k]]. Then I fit a linear regression to each cluster in C1. My dependant variable is "Death". My predictors are different in each cluster and vars[[i]] (i=1,...,k) shows a list of predictors' name. I want to predict Death for each case in test data set (C2[[1]],..., C2[[k]). When I run the following code, for some of the clusters.
I got this warning:
In predict.lm(y[[i]], C2[[i]]) :
prediction from a rank-deficient fit may be misleading
I read a lot about this warning but I couldn't figure out what the issue is.