Dropping variable in lm formula still triggers contrast error - McMap

About

Dropping variable in lm formula still triggers contrast error

Asked 12/2, 2020 at 23:0 Answered 12/2, 2020 at 23:33

Solved r formula lm factors

C

1

9

I'm trying to run lm() on only a subset of my data, and running into an issue.

dt = data.table(y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100), x3 = as.factor(c(rep('men',50), rep('women',50)))) # sample data

lm( y ~ ., dt) # Use all x: Works
lm( y ~ ., dt[x3 == 'men']) # Use all x, limit to men: doesn't work (as expected)

The above doesn't work because the dataset now has only men, and we therefore can't include x3, the gender variable, into the model. BUT...

lm( y ~ . -x3, dt[x3 == 'men']) # Exclude x3, limit to men: STILL doesn't work
lm( y ~ x1 + x2, dt[x3 == 'men']) # Exclude x3, with different notation: works great

This is an issue with the "minus sign" notation in the formula? Please advice. Note: Of course I can do it a different way; for example, I could exclude the variables prior to putting them into lm(). But I'm teaching a class on this stuff, and I don't want to confuse the students, having already told them they can exclude variable using a minus sign in the formula.

Charqui answered 12/2, 2020 at 23:0 Comment(5)

It's interesting that both model.matrix(y ~ . - x3, data = dt[x3 == "men"]) and model.matrix(y ~ x1 + x2, data = dt[x3 == "men"]) work (lm calls model.matrix internally). The only difference between both model matrices is a "contrasts" attribute (which still contains x3) and which gets picked up later on within the lm routine, likely causing the error you're seeing. So my feeling is that the issue has to do with how model.matrix creates and stores the design matrix when removing terms. – Canonry 12/2, 2020 at 23:33

I was trying to "expand" the . to get a simplified formula with terms(y ~ . -x3, data=dt, simplify=TRUE) but oddly it still retains x3 in the variables attribute which trips up lm – Hazardous 12/2, 2020 at 23:36

@Hazardous - it looks like the unimplemented-in-R neg.out= option might be related. From the S help files for terms, where neg.out= is implemented: flag controlling the treatment of terms entering with "-" sign. If TRUE, terms will be checked for cancellation and otherwise ignored. If FALSE, negative terms will be retained (with negative order). – Drab 12/2, 2020 at 23:50

@MauritsEvers: lm calls model.matrix on a modified version of the data. At the very beginning, lm composes and evaluates the following expression: mf <- stats::model.frame( y ~ . -x3, dt[x3=="men"], drop.unused.levels=TRUE ). This causes x3 to become a single-level factor. model.matrix() is then called on mf, not the original data, resulting in the error we're observing. – Millenary 13/2, 2020 at 16:37

@ArtemSokolov but the -x3 in the formula should exclude x3 from the dataframe, so it doesn't matter whether it's single level or not. Why it doesn't exclude it? – Holdall 14/6, 2022 at 19:51

B

2

The error you are getting is because x3 is in the model with only one value = "men" (see comment below from @Artem Sokolov)

One way to solve it is to subset ahead of time:

dt = data.table(y = rnorm(100), x1 = rnorm(100), x2 = rnorm(100), x3 = as.factor(c(rep('men',50), rep('women',50)))) # sample data

dmen<-dt[x3 == 'men'] # create a new subsetted dataset with just men

lm( y ~ ., dmen[,-"x3"]) # now drop the x3 column from the dataset (just for the model)

Or you can do both in the same step:

lm( y ~ ., dt[x3 == 'men',-"x3"])

Byway answered 12/2, 2020 at 23:33 Comment(2)

Overall, this is a nice solution. One thing to correct is that -x3 in a formula does not cause lm to think that you're trying to subtract the column. The "don't use x3 in the model" intent is communicated correctly, but the issue is that lm calls model.frame( ..., drop.unused.levels=TRUE ) causing x3 to become a single-level factor, leading to downstream problems in model.matrix(). – Millenary 13/2, 2020 at 16:42

Thanks for clarification Artem Sokolov, I have taken that incorrect explanation out of my answer. – Byway 13/2, 2020 at 17:26

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.