I want to use function gam
in mgcv
packages:
x <- seq(0,60, len =600)
y <- seq(0,1, len=600)
prova <- gam(y ~ s(x, bs='cr')
can I set the number of knots in s()
? and then can I know where are the knots that the spline used? Thanks!
I want to use function gam
in mgcv
packages:
x <- seq(0,60, len =600)
y <- seq(0,1, len=600)
prova <- gam(y ~ s(x, bs='cr')
can I set the number of knots in s()
? and then can I know where are the knots that the spline used? Thanks!
While setting k
is the correct way to go, fx = TRUE
is definitely not right: it will force using pure regression spline without penalization.
locations of knots
For penalized regression spline, the exact locations are not important, as long as:
k
is adequately big;By default:
bs = 'cr'
places knots by quantile;bs = 'bs'
, bs = 'ps'
, bs = 'ad'
) place knots evenly.Compare the following:
library(mgcv)
## toy data
set.seed(0); x <- sort(rnorm(400, 0, pi)) ## note, my x are not uniformly sampled
set.seed(1); e <- rnorm(400, 0, 0.4)
y0 <- sin(x) + 0.2 * x + cos(abs(x))
y <- y0 + e
## fitting natural cubic spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = 20))
cr_knots <- cr_fit$smooth[[1]]$xp ## extract knots locations
## fitting B-spline
bs_fit <- gam(y ~ s(x, bs = 'bs', k = 20))
bs_knots <- bs_fit$smooth[[1]]$knots ## extract knots locations
## summary plot
par(mfrow = c(1,2))
plot(x, y, col= "grey", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
plot(x, y, col= "grey", main = "B-spline");
lines(x, bs_fit$linear.predictors, col = 2, lwd = 2)
abline(v = bs_knots, lty = 2)
You can see the difference in knots placement.
Setting your own knots locations:
You can also provide your customized knots locations via the knots
argument of gam()
(yes, knots are not fed to s()
, but to gam()
). For example, you can do evenly spaced knots for cr
:
xlim <- range(x) ## get range of x
myfit <- gam(y ~ s(x, bs = 'cr', k = 20),
knots = list(x = seq(xlim[1], xlim[2], length = 20)))
Now you can see that:
my_knots <- myfit$smooth[[1]]$xp
plot(x, y, col= "grey", main = "my knots");
lines(x, myfit$linear.predictors, col = 2, lwd = 2)
abline(v = my_knots, lty = 2)
However, there is usually no need to set knots yourself. But if you do want to do this, you must be clear what you are doing. In particular, the number of knots you provide must not conflict with the k
in s()
.
This is a very rich answer. The length of
bs_knots
is 24. The "dimension" of the spline basis is inbs_fit$smooth[[1]]$bs.dim
, which is 20.
Yes, for B-splines family, the number of B-splines does not equal the number of knots. Knots placement for B-splines is a dirty work and error-prone. See https://mcmap.net/q/659777/-fitting-a-smooth-spline-gam-function-in-r-error-in-number-of-knots-required-to-fit-spline-knot-requirements-increasing for a demonstration with B-splines.
gam
, e.g. by calling smoothCon
? –
Bourne x <- runif(100); sm <- smoothCon(s(x, bs = 'cr', k = 20), data = data.frame(x = x), knots = NULL); sm[[1]]$xp
Note that if you use smoothCon
, you can directly set knots
(but it needs match k
). knots = FALSE
will do auto knots placement. –
Everything smoothCon(s(x, bs = 'cr', k = 20), data = data.frame(x = x), knots = seq(0,1,length.out=20))
it gives me an error message: Error in data[[txt]] : subscript out of bounds. –
Bourne smoothCon(s(x, bs = 'cr', k = 20), data = data.frame(x = x), knots = list(x = seq(0,1,length.out=20)))
Knots need be provided in a list. This makes sense as when constructing multivariate splines like tensor product splines, each variable has a set of knots and they should be given in a list. –
Everything © 2022 - 2024 — McMap. All rights reserved.
bs_knots
is 24. The "dimension" of the spline basis is inbs_fit$smooth[[1]]$bs.dim
. – Steck