Different NA actions for coefficients and summary of linear model in R
Asked Answered
C

3

11

In R, when using lm(), if I set na.action = na.pass inside the call to lm(), then in the summary table there is an NA for any coefficient that cannot be estimated (because of missing cells in this case).

If, however, I extract just the coefficients from the summary object, using either summary(myModel)$coefficients or coef(summary(myModel)), then the NA's are omitted.

I want the NA's to be included when I extract the coefficients the same way that they are included when I print the summary. Is there a way to do this?

Setting options(na.action = na.pass) does not seem to help.

Here is an example:

> set.seed(534)
> myGroup1 <- factor(c("a","a","a","a","b","b"))
> myGroup2 <- factor(c("first","second","first","second","first","first"))
> myDepVar <- rnorm(6, 0, 1)
> myModel <- lm(myDepVar ~ myGroup1 + myGroup2 + myGroup1:myGroup2)
> summary(myModel)

Call:
lm(formula = myDepVar ~ myGroup1 + myGroup2 + myGroup1:myGroup2)

Residuals:
       1        2        3        4        5        6 
-0.05813  0.55323  0.05813 -0.55323 -0.12192  0.12192 

Coefficients: (1 not defined because of singularities)
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)         -0.15150    0.23249  -0.652    0.561
myGroup11            0.03927    0.23249   0.169    0.877
myGroup21           -0.37273    0.23249  -1.603    0.207
myGroup11:myGroup21       NA         NA      NA       NA

Residual standard error: 0.465 on 3 degrees of freedom
Multiple R-squared: 0.5605,     Adjusted R-squared: 0.2675 
F-statistic: 1.913 on 2 and 3 DF,  p-value: 0.2914 

> coef(summary(myModel))
               Estimate Std. Error    t value  Pr(>|t|)
(Intercept) -0.15149826  0.2324894 -0.6516352 0.5611052
myGroup11    0.03926774  0.2324894  0.1689012 0.8766203
myGroup21   -0.37273117  0.2324894 -1.6032180 0.2072173

> summary(myModel)$coefficients
               Estimate Std. Error    t value  Pr(>|t|)
(Intercept) -0.15149826  0.2324894 -0.6516352 0.5611052
myGroup11    0.03926774  0.2324894  0.1689012 0.8766203
myGroup21   -0.37273117  0.2324894 -1.6032180 0.2072173
Courtnay answered 7/6, 2012 at 14:55 Comment(1)
Would you agree that this is a bug?Barytone
R
3

Why don't you just extract the coefficients from the fitted model:

> coef(myModel)
             (Intercept)                myGroup1b 
             -0.48496169              -0.07853547 
          myGroup2second myGroup1b:myGroup2second 
              0.74546233                       NA

That seems the easiest option.

na.action has nothing to do with this. Note that you didn't pass na.action = na.pass in your example.

na.action is a global option for handling NA in the data passed to a model fit, usually in conjunction with a formula; it is also the name of a function na.action(). R builds up the so called model frame from the data argument and the symbolic representation of the model expressed in the formula. At this point, any NA would be detected and the default option for na.action is to use na.omit() to remove the NA from the data by dropping samples with NA for any variable. There are alternatives, most usefully na.exclude(), which would remove NA during fitting but add back NA in the correct places in the fitted values, residuals etc. Read ?na.omit and ?na.action for more, plus ?options for more on this.

Recipe answered 7/6, 2012 at 15:24 Comment(4)
Thanks for explaining that the na.action settings aren't relevant to this problem. Extracting the coefficients from the fitted model might work as a last resort, but I wanted to bind a couple of columns for confidence intervals to the summary table. I don't want just the estimates; I want the standard errors, p-values, etc., with confidence intervals attached at the end. I could just make the table from scratch, but I thought there might be some simple setting that needed changing to get coef(summary(myModel)) and confint(myModel) to output the same number of rows in the same order.Courtnay
@Jdub, did you figure this out? I have the exact same problem.Paedogenesis
same here ! same problemOgpu
@Courtnay isn't this just a matter of saying: summary(model)[coef(model), ] where the NA argument to i of [ produces a totally NA row? I hope that's what you're asking for, because that's the only output that makes sense to me. Otherwise, you may need to describe a bit better what you're trying to do.Famous
S
1

the documentation of summary.lm says 'Aliased coefficients are omitted in the return object but restored by the print method'. It seems there is no parameter to control this omit. There is another work around besides using coef(summary(myModel)) as suggested by @Gavin Simpson. You can create a matrix

nr <- num_regressors - nrow(summary(myModel)$coefficients) ##num_regressors shall be defined previously
nc <- 4
rnames <- names(which(summary(myModel)$aliased))
cnames <- colnames(summary(myModel)$coefficients)
mat_na <- matrix(data = NA,nrow = nr,ncol = nc,
           dimnames = list(rnames,cnames))

and then rbind the two matrice:

mat_coef <- rbind(summary(myModel)$coefficients,mat_na)
Stubblefield answered 12/1, 2017 at 10:21 Comment(0)
H
-1

You can also just transform the summary fit table into a data frame (where the variables that are NA are lost):

fit <- as.data.frame(summary(fit)$coefficients)

And then extract the coefficients by name:

fit["age", "Pr(>|z|)"]

If "age" has been dropped, you'll get an NA when trying to extract the P-value for age from the dataframe

Hanlon answered 4/7, 2018 at 11:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.