Extract pvalue from glm
Asked Answered
I

5

53

I'm running many regressions and am only interested in the effect on the coefficient and p-value of one particular variable. So, in my script, I'd like to be able to just extract the p-value from the glm summary (getting the coefficient itself is easy). The only way I know of to view the p-value is using summary(myReg). Is there some other way?

e.g.:

fit <- glm(y ~ x1 + x2, myData)
x1Coeff <- fit$coefficients[2] # only returns coefficient, of course
x1pValue <- ???

I've tried treating fit$coefficients as a matrix, but am still unable to simply extract the p-value.

Is it possible to do this?

Thanks!

Indispose answered 23/5, 2014 at 22:2 Comment(0)
V
74

You want

coef(summary(fit))[,4]

which extracts the column vector of p values from the tabular output shown by summary(fit). The p-values aren't actually computed until you run summary() on the model fit.

By the way, use extractor functions rather than delve into objects if you can:

fit$coefficients[2]

should be

coef(fit)[2]

If there aren't extractor functions, str() is your friend. It allows you to look at the structure of any object, which allows you to see what the object contains and how to extract it:

summ <- summary(fit)

> str(summ, max = 1)
List of 17
 $ call          : language glm(formula = counts ~ outcome + treatment, family = poisson())
 $ terms         :Classes 'terms', 'formula' length 3 counts ~ outcome + treatment
  .. ..- attr(*, "variables")= language list(counts, outcome, treatment)
  .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. ..- attr(*, "term.labels")= chr [1:2] "outcome" "treatment"
  .. ..- attr(*, "order")= int [1:2] 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(counts, outcome, treatment)
  .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "factor" "factor"
  .. .. ..- attr(*, "names")= chr [1:3] "counts" "outcome" "treatment"
 $ family        :List of 12
  ..- attr(*, "class")= chr "family"
 $ deviance      : num 5.13
 $ aic           : num 56.8
 $ contrasts     :List of 2
 $ df.residual   : int 4
 $ null.deviance : num 10.6
 $ df.null       : int 8
 $ iter          : int 4
 $ deviance.resid: Named num [1:9] -0.671 0.963 -0.17 -0.22 -0.956 ...
  ..- attr(*, "names")= chr [1:9] "1" "2" "3" "4" ...
 $ coefficients  : num [1:5, 1:4] 3.04 -4.54e-01 -2.93e-01 1.34e-15 1.42e-15 ...
  ..- attr(*, "dimnames")=List of 2
 $ aliased       : Named logi [1:5] FALSE FALSE FALSE FALSE FALSE
  ..- attr(*, "names")= chr [1:5] "(Intercept)" "outcome2" "outcome3" "treatment2" ...
 $ dispersion    : num 1
 $ df            : int [1:3] 5 4 5
 $ cov.unscaled  : num [1:5, 1:5] 0.0292 -0.0159 -0.0159 -0.02 -0.02 ...
  ..- attr(*, "dimnames")=List of 2
 $ cov.scaled    : num [1:5, 1:5] 0.0292 -0.0159 -0.0159 -0.02 -0.02 ...
  ..- attr(*, "dimnames")=List of 2
 - attr(*, "class")= chr "summary.glm"

Hence we note the coefficients component which we can extract using coef(), but other components don't have extractors, like null.deviance, which you can extract as summ$null.deviance.

Voluntaryism answered 23/5, 2014 at 22:5 Comment(13)
you beat me while I was looking for dupes (there aren't any perfect duplicates but there are a lot of posts about extracting stuff from [g]lm fits: e.g. #12496868 )Aeroscope
might be worth adding a comment on str() when you don't know what accessors are available and have to try to dig stuff out of the object for yourself.Aeroscope
Thanks @BenBolker. Have added something on str.Voluntaryism
Actually, I was using str() to try to figure out how to get the data, but I don't see where in the str() I can deduce that I need the coef() function to get what I'm looking for. I'm reading you're update and I still don't see that either...Indispose
@Ben I looked a few times for other posts but couldn't find any.Indispose
the way to find out about coef is to do methods(class="lm") or methods(class="summary.lm"). I agree that you can't figure out from looking at str() that you could use coef().Aeroscope
Actually, I can't even see in the methods(class="summary.glm") or methods(class="glm") output that I should be able to use coef() on the object. Could you please elaborate? Thanks!Indispose
I tend to use names(summ) or attributes(summ) to get at the attributes of an object; although this does not tell me the methods it gives me more compact info than str. Sometimes I use the frowned upon summ$coefficients if I'm not sure of the extractor function; I have not yet received an excommunication notice from the Church of R.Lentamente
@Clark Look at class(fit) and you'll see that a glm` fit inherits from class "lm" so you would need to look for methods for that class.Voluntaryism
I've been using this: cbind( coef(summary(model))[, 4], exp(coef(model)), exp(confint(model))) , but it takes a long time "Waiting for profiling to be done..." any suggestions as to how to fix this?Ionization
@Ionization don't use profile likelihood to compute confidence intervals for parameters (which is what the confint() method for GLMs is doing.Voluntaryism
thanks! @GavinSimpson, what should I use instead?Ionization
@Ionization well, there's a reason confint() is using profile likelihood - the usual Gaussian approximation can often give poor confidence intervals in GLMs. Unfortunately profiling the likelihood of the model takes time. You could form intervals on the link scale as beta_i +/- (2 * std_err_i) and then back transform those intervals to the response scale using the inverse of the link function, but that's using the gaussian approximation which can be problematic in some situations. Bootstrapping could also be used. That won't be quick either.Voluntaryism
A
12

Instead of the number you can put directly the name

coef(summary(fit))[,'Pr(>|z|)']

the other ones available from the coefficient summary:

Estimate Std. Error z value Pr(>|z|)

Astrid answered 24/1, 2018 at 9:22 Comment(0)
L
5

I've used this technique in the past to pull out predictor data from summary or from a fitted model object:

coef(summary(m))[grepl("var_i_want$",row.names(coef(summary(m)))), 4]

which lets me easily edit which variable I want to get data on.

Or as pointed out be @Ben, use match or %in%, somewhat cleaner than grepl:

coef(summary(m))[row.names(coef(summary(m))) %in% "var_i_want" , 4]
Lentamente answered 23/5, 2014 at 22:18 Comment(1)
or use match() ... or just index the row appropriatelyAeroscope
M
3

The tidy function from the broom package (part of the Tidyverse, available on CRAN) provides a quick and easy way to convert your GLM summaries into a data frame, which might be useful in a number of situations other than the one you described above.

In this case, your desired output could be obtained with the code:

x1pValue <- broom::tidy(fit)$p.value[2]
Moot answered 16/6, 2021 at 20:51 Comment(0)
G
1

Well, this would be another way, however not the most efficient way to perform it :

a = coeftable(model).cols[4]
pVals = [ a[i].v for i in 1:length(a) ]

This ensures that the extracted values from the glm is not in StatsBase. Therein, you can play around with pVals as per your requirement. Hope it helps, Ebby

Gabrila answered 1/5, 2018 at 8:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.