Extract p-value from gam.check in R
Asked Answered
M

2

6

When I run gam.check(my_spline_gam), I get the following output.

Method: GCV   Optimizer: magic
Smoothing parameter selection converged after 9 iterations.
The RMS GCV score gradiant at convergence was 4.785628e-06 .
The Hessian was positive definite.
The estimated model rank was 25 (maximum possible: 25)
Model rank =  25 / 25 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

         k'    edf k-index p-value
s(x) 24.000 22.098   0.849    0.06

My question is whether I can extract this p-value separately to a table.

Morganite answered 20/11, 2018 at 9:28 Comment(7)
str(gam.check(my_spline_gam)) somewhere the p-value should be.Eyde
that still gives the same output, whereas I would just want either only the one line of results or just the p-value. thanks!Morganite
please add the result of dput(gam.check(my_spline_gam)) to your question. Then I can solve it.Eyde
quick look at the code suggests you can use k.check(yourmodel, subsample = 5000, n.rep = 200)Hazard
@AndreElrico: this is the output: dput(gam.check(my_spline_gam)) Method: GCV Optimizer: magic Smoothing parameter selection converged after 9 iterations. The RMS GCV score gradiant at convergence was 4.785628e-06 . The Hessian was positive definite. The estimated model rank was 25 (maximum possible: 25) Model rank = 25 / 25 Basis dimension (k) checking results. Low p-value (k-index<1) may indicate that k is too low, especially if edf is close to k'. k' edf k-index p-value s(x) 24.000 22.098 0.849 0.03 structure(list(mfrow = c(2L, 2L)), .Names = "mfrow")Morganite
@user20650: k.check returns: Error: could not find function "k.check".Morganite
@Morganite ; it is in the mgcv package. I have package packageVersion("mgcv") ; ‘1.8.25’ . See could-not-find-function for troubleshooting.Hazard
T
1

Use capture.output coupled with a little string manipulation -

gam_obj <- capture.output(gam.check(b,pch=19,cex=.3))
gam_tbl <- gam_obj[12:length(gam_obj)]
str_spl = function(x){
  p_value <- strsplit(x, " ")[[1]]
  output_p <- as.numeric(p_value[length(p_value)])
}
p_values <- data.frame(sapply(gam_tbl, str_spl))

Output

enter image description here

Tallulah answered 20/11, 2018 at 10:4 Comment(1)
thx! it didn't work exactly, but an elaboration of this worked fine. The only thing is that ideally I would also like to have the result up until the 3rd decimal. gam_obj <- capture.output(gam.check(my_spline_gam,pch=19,cex=.3)) gam_tbl <- gam_obj[12:length(gam_obj)] p_str = unlist(strsplit(gam_tbl, " ", fixed=TRUE)) p_value = as.numeric(p_str[8]) p_valueMorganite
U
4

Looks like you cannot store the result in an object the normal way. You could use capture.output to store the console output in an object, and then subsequently use str_split to get the correct value. So for the example in the help file this would be:

library(mgcv)
set.seed(0)
dat <- gamSim(1,n=200)
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
r <- capture.output(gam.check(b))
p <- strsplit(r[12], " ")[[1]][11]

But because the p-value is just a string you wouldn't get the exact p-value this way.

Edit: user20650's answer will give you the proper output:

r <- k.check(b)
r[,'p-value']
Uninhibited answered 20/11, 2018 at 9:51 Comment(7)
thx! k.check seems not to work. is it another package I should import?Morganite
@Morganite Means you need to do k.check(my_spline_gam) rather than gam.check(my_spline_gam). P-values are very similar and you should be able to use them.Nata
@Nata ; p-values are the same: any differences are due to the calculation being stochastic. So best to use set.seedHazard
@Hazard Both commands yield slightly different p-values. But I'm with you that we can consider both the same due to stochastic calculations within the commands, where' you cannot easily set a seed and obviously set.seed() has no effect.Nata
@Nata ; I was about to disagree with you as gam.check explicitly calls k.check so they should give the same results (and you can use the seed as set.seed(1) ; printCoefmat(k.check(b, subsample = 5000, n.rep = 200), digits = 3)). However, a quick look shows there are other random sample calls used in the earlier plot functions in gam.check (not related to the output table), which will move the seed, hence difficult to get the same. But it is the same function doing the work..Hazard
thank you all, but the k.check() does not work for me. Error: could not find function "k.check"Morganite
@Morganite ; k.check exits in mgcv (and is actually in gam.check). If you can't see the function, or don't have it, it could be because you are using an earlier version of mgcv or R. Can you edit your question with the results of sessionInfo().Hazard
T
1

Use capture.output coupled with a little string manipulation -

gam_obj <- capture.output(gam.check(b,pch=19,cex=.3))
gam_tbl <- gam_obj[12:length(gam_obj)]
str_spl = function(x){
  p_value <- strsplit(x, " ")[[1]]
  output_p <- as.numeric(p_value[length(p_value)])
}
p_values <- data.frame(sapply(gam_tbl, str_spl))

Output

enter image description here

Tallulah answered 20/11, 2018 at 10:4 Comment(1)
thx! it didn't work exactly, but an elaboration of this worked fine. The only thing is that ideally I would also like to have the result up until the 3rd decimal. gam_obj <- capture.output(gam.check(my_spline_gam,pch=19,cex=.3)) gam_tbl <- gam_obj[12:length(gam_obj)] p_str = unlist(strsplit(gam_tbl, " ", fixed=TRUE)) p_value = as.numeric(p_str[8]) p_valueMorganite

© 2022 - 2024 — McMap. All rights reserved.