I'm trying to do a simple linear regression model in R.
there are three factor variables in the model.
the model is
lm(Exercise ~ Econ + Job + Position)
where "Exercise" is numeric dependent variable, the amount of time exercising.
"Econ", "Job", "Position" are all factor variables.
"Econ" is whether a person is employed or not. (levels = employed / not employed)
"Job" is the job type a person has. There are five levels for this variable.
"Position" is the position a person has in the workplace. There are five levels for this variable also.
I tried to do a linear regression and got an error,
"contrasts can be applied only to factors with 2 or more levels"
I think this error is due to NA in the factor level, because if "Econ" is equal to 'unemployed', "Job" and "Position" has NA value. (Since obviously, unemployed people does not have job type and job position)
If I regress two model separately like below, no error occurs.
lm(Exercise ~ Econ)
lm(Exercise ~ Job + Position)
However, I want one model that can automatically use variables as needed, and one result table. So if "Econ" is 'employed', then "Job", "Position" variable is used for regression. If "Econ" is 'unemployed', then "Job", "Position" variable is automatically dropped from the model.
The reason I want one model instead of two model is by putting all variables in the model, I can see the effect of "Econ"(employed or unemployed) among people who are 'employed'
If I just regress
lm(Exercise ~ Job + Position)
I do not know the effect of employment.
I thought of a solution to put 0 = 'unemployed level' for all NA values of "Job" and "Position", but I am not sure this will solve problem, and thought this might lead to multicollinearity problem.
is there any way to automatically/conditionally drop NA observations according to some other factor variable?
Below are my reproducible example.
Exercise <- c(50, 30, 25, 44, 32, 50 ,22, 14)
Econ <- as.factor(c(1, 0, 1, 1, 0, 0, 1, 1))
# 0 = unemployed, 1 = employed
Job <- as.factor(c("A", NA, "B", "B", NA, NA, "A", "C"))
Position <- as.factor(c("Owner", NA,"Employee", "Owner",
NA, NA, "Employee", "Director"))
data <- data.frame(Exercise, Econ, Job, Position)
str(data)
lm(Exercise ~ Econ + Job + Position)
lm(Exercise ~ Econ)
lm(Exercise ~ Job + Position)
Here what I want is first model lm(Exercise ~ Econ + Job + Position), but I get an error, because for all Econ = 0(Unemployed), Job and Position value is NA.
lm_model <- lm(Exercise ~ Econ + Job + Position)
? – Kaczmareklm
ran OK. But considering thatna.action = na.omit
is"The ‘factory-fresh’ default"
, the estimates forEcon
areNA
, the rows withNA
values (Econ == 0
) are removed before fitting the model. – Marnisummary(fit)
prints the following:Coefficients: (1 not defined because of singularities)
not the usual coefficients title line. – Marni