mgcv: How to set number and / or locations of knots for splines

While setting k is the correct way to go, fx = TRUE is definitely not right: it will force using pure regression spline without penalization.

locations of knots

For penalized regression spline, the exact locations are not important, as long as:

k is adequately big;
the spread of knots has good, reasonable coverage.

By default:

natural cubic regression spline bs = 'cr' places knots by quantile;
B-splines family (bs = 'bs', bs = 'ps', bs = 'ad') place knots evenly.

Compare the following:

library(mgcv)

## toy data
set.seed(0); x <- sort(rnorm(400, 0, pi))  ## note, my x are not uniformly sampled
set.seed(1); e <- rnorm(400, 0, 0.4)
y0 <- sin(x) + 0.2 * x + cos(abs(x))
y <- y0 + e

## fitting natural cubic spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = 20))
cr_knots <- cr_fit$smooth[[1]]$xp  ## extract knots locations

## fitting B-spline
bs_fit <- gam(y ~ s(x, bs = 'bs', k = 20))
bs_knots <- bs_fit$smooth[[1]]$knots  ## extract knots locations

## summary plot
par(mfrow = c(1,2))
plot(x, y, col= "grey", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
plot(x, y, col= "grey", main = "B-spline");
lines(x, bs_fit$linear.predictors, col = 2, lwd = 2)
abline(v = bs_knots, lty = 2)

You can see the difference in knots placement.

Setting your own knots locations:

You can also provide your customized knots locations via the knots argument of gam() (yes, knots are not fed to s(), but to gam()). For example, you can do evenly spaced knots for cr:

xlim <- range(x)  ## get range of x
myfit <- gam(y ~ s(x, bs = 'cr', k = 20),
         knots = list(x = seq(xlim[1], xlim[2], length = 20)))

Now you can see that:

my_knots <- myfit$smooth[[1]]$xp
plot(x, y, col= "grey", main = "my knots");
lines(x, myfit$linear.predictors, col = 2, lwd = 2)
abline(v = my_knots, lty = 2)

However, there is usually no need to set knots yourself. But if you do want to do this, you must be clear what you are doing. In particular, the number of knots you provide must not conflict with the k in s().

This is a very rich answer. The length of bs_knots is 24. The "dimension" of the spline basis is in bs_fit$smooth[[1]]$bs.dim, which is 20.

Yes, for B-splines family, the number of B-splines does not equal the number of knots. Knots placement for B-splines is a dirty work and error-prone. See https://mcmap.net/q/659777/-fitting-a-smooth-spline-gam-function-in-r-error-in-number-of-knots-required-to-fit-spline-knot-requirements-increasing for a demonstration with B-splines.

Recommended topics

Hot tags