mgcv: How to set number and / or locations of knots for splines
Asked Answered
M

1

21

I want to use function gam in mgcv packages:

 x <- seq(0,60, len =600)
 y <- seq(0,1, len=600) 
 prova <- gam(y ~ s(x, bs='cr')

can I set the number of knots in s()? and then can I know where are the knots that the spline used? Thanks!

Mcgray answered 15/10, 2016 at 7:54 Comment(0)
E
38

While setting k is the correct way to go, fx = TRUE is definitely not right: it will force using pure regression spline without penalization.


locations of knots

For penalized regression spline, the exact locations are not important, as long as:

  • k is adequately big;
  • the spread of knots has good, reasonable coverage.

By default:

  • natural cubic regression spline bs = 'cr' places knots by quantile;
  • B-splines family (bs = 'bs', bs = 'ps', bs = 'ad') place knots evenly.

Compare the following:

library(mgcv)

## toy data
set.seed(0); x <- sort(rnorm(400, 0, pi))  ## note, my x are not uniformly sampled
set.seed(1); e <- rnorm(400, 0, 0.4)
y0 <- sin(x) + 0.2 * x + cos(abs(x))
y <- y0 + e

## fitting natural cubic spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = 20))
cr_knots <- cr_fit$smooth[[1]]$xp  ## extract knots locations

## fitting B-spline
bs_fit <- gam(y ~ s(x, bs = 'bs', k = 20))
bs_knots <- bs_fit$smooth[[1]]$knots  ## extract knots locations

## summary plot
par(mfrow = c(1,2))
plot(x, y, col= "grey", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
plot(x, y, col= "grey", main = "B-spline");
lines(x, bs_fit$linear.predictors, col = 2, lwd = 2)
abline(v = bs_knots, lty = 2)

enter image description here

You can see the difference in knots placement.


Setting your own knots locations:

You can also provide your customized knots locations via the knots argument of gam() (yes, knots are not fed to s(), but to gam()). For example, you can do evenly spaced knots for cr:

xlim <- range(x)  ## get range of x
myfit <- gam(y ~ s(x, bs = 'cr', k = 20),
         knots = list(x = seq(xlim[1], xlim[2], length = 20)))

Now you can see that:

my_knots <- myfit$smooth[[1]]$xp
plot(x, y, col= "grey", main = "my knots");
lines(x, myfit$linear.predictors, col = 2, lwd = 2)
abline(v = my_knots, lty = 2)

enter image description here

However, there is usually no need to set knots yourself. But if you do want to do this, you must be clear what you are doing. In particular, the number of knots you provide must not conflict with the k in s().

This is a very rich answer. The length of bs_knots is 24. The "dimension" of the spline basis is in bs_fit$smooth[[1]]$bs.dim, which is 20.

Yes, for B-splines family, the number of B-splines does not equal the number of knots. Knots placement for B-splines is a dirty work and error-prone. See https://mcmap.net/q/659777/-fitting-a-smooth-spline-gam-function-in-r-error-in-number-of-knots-required-to-fit-spline-knot-requirements-increasing for a demonstration with B-splines.

Everything answered 18/10, 2016 at 16:6 Comment(5)
This is a very rich answer. The length of bs_knots is 24. The "dimension" of the spline basis is in bs_fit$smooth[[1]]$bs.dim.Steck
@李哲源 Thanks for the detailed answer. Is there a way to extract the knots without calling gam, e.g. by calling smoothCon?Bourne
@Bourne Say x <- runif(100); sm <- smoothCon(s(x, bs = 'cr', k = 20), data = data.frame(x = x), knots = NULL); sm[[1]]$xp Note that if you use smoothCon, you can directly set knots (but it needs match k). knots = FALSE will do auto knots placement.Everything
@李哲源 Thank you very much. I see 20 knots in the code you provided. But when I try smoothCon(s(x, bs = 'cr', k = 20), data = data.frame(x = x), knots = seq(0,1,length.out=20)) it gives me an error message: Error in data[[txt]] : subscript out of bounds.Bourne
@Bourne smoothCon(s(x, bs = 'cr', k = 20), data = data.frame(x = x), knots = list(x = seq(0,1,length.out=20))) Knots need be provided in a list. This makes sense as when constructing multivariate splines like tensor product splines, each variable has a set of knots and they should be given in a list.Everything

© 2022 - 2024 — McMap. All rights reserved.