How to drop NA observation of factors conditionally when doing linear regression in R?

Exercise <- c(50, 30, 25, 44, 32, 50 ,22, 14) Econ <- as.factor(c(1, 0, 1, 1, 0, 0, 1, 1)) # 0 = unemployed, 1 = employed Job <- as.factor(c("A", NA, "B", "B", NA, NA, "A", "C")) Position <- as.factor(c("Owner", NA,"Employee", "Owner", NA, NA, "Employee", "Director")) data <- data.frame(Exercise, Econ, Job, Position) str(data) lm(Exercise ~ Econ + Job + Position) lm(Exercise ~ Econ) lm(Exercise ~ Job + Position)

If you really truly just want the first model to run without errors (assuming the same missing values handling you are using), then you could do this.

lm(Exercise ~ as.integer(Econ) + Job + Position)

Note, that all you have really done is found the same result as the third model.

lm(Exercise ~ Job + Position) # third model
lm(Exercise ~ as.integer(Econ) + Job + Position) # first model

coef(lm(Exercise ~ Job + Position))
coef(lm(Exercise ~ as.integer(Econ) + Job + Position))

Unless you change how you are handling missing values, the first model that you want lm(Exercise ~ Econ + Job + Position) would be equivalent to the third model lm(Exercise ~ Job + Position) Here is why.

By default, na.action = na.omit within the lm function. This means that any rows with any missing values for the predictor or response variables will be dropped. There are multiple ways you can see this. One is by applying model.matrix which is what lm will do under the hood.

model.matrix(Exercise ~ Econ + Job + Position)
  (Intercept) Econ1 JobB JobC PositionEmployee PositionOwner
1           1     1    0    0                0             1
3           1     1    1    0                1             0
4           1     1    1    0                0             1
7           1     1    0    0                1             0
8           1     1    0    1                0             0

As you already correctly pointed out, Econ = 0 is perfectly aligned with position = NA . Thus, lm is dropping those observations and you end up with Econ having a single value which lm does not know how to handle a factor with a single level. I bypassed this error by using as.integer() however, you still end up with a predictor with only a single value.

Next, lm will silently drop such predictors which is why you are getting an NA for the coefficient on as.integer(Econ). This is because the default for singular.ok = TRUE.

If you were to set singular.ok = FALSE you would get an error that is basically saying that you are trying to fit a model that has only a single value for a predictor.

lm(Exercise ~ as.integer(Econ) + Job + Position, singular.ok = FALSE)
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  singular fit encountered

Recommended topics

Hot tags