I am interested in some disturbing behaviour of the lm
function and the associated predict.lm
function in R. The splines
base package provides the function bs
to generate b-spline expansions, which can then be used to fit a spline model using lm
, a versatile linear model fitting function.
The lm
and predict.lm
functions have a lot of built-in convenience that take advantage of formulas and terms. If the call to bs()
is nested inside the lm
call, then the user can provide univariate data to predict
, and this data will automatically be expanded into the appropriate b-spline basis. This expanded matrix of data will then predicted upon as usual.
library(splines)
x <- sort(runif(50, 0, 10))
y <- x^2
splineModel <- lm(y ~ bs(x, y, degree = 3, knots = c(3, 6)))
newData <- data.frame(x = 4)
prediction <- predict(splineModel, newData) # 16
plot(x, y)
lines(x, splineModel$fitted.values, col = 'blue3')
points(newData$x, prediction, pch = 3, cex = 3, col = 'red3')
legend("topleft", legend = c("Data", "Fitted Values", "Predicted Value"),
pch = c(1, NA, 3), col = c('black', 'blue3', 'red3'), lty = c(NA, 1, NA))
As we see, this works perfectly:
The strangeness happens when one uses the ::
operator to explicitly indicate that the bs
function is exported from the namespace of the splines
package. The following code snippet is identical except for that change:
library(splines)
x <- sort(runif(50, 0, 10))
y <- x^2
splineModel <- lm(y ~ splines::bs(x, y, degree = 3, knots = c(3, 6)))
newData <- data.frame(x = 4)
prediction <- predict(splineModel, newData) # 6.40171
plot(x, y)
lines(x, splineModel$fitted.values, col = 'blue3')
points(newData$x, prediction, pch = 3, cex = 3, col = 'red3')
legend("topleft", legend = c("Data", "Fitted Values", "Predicted Value"),
pch = c(1, NA, 3), col = c('black', 'blue3', 'red3'), lty = c(NA, 1, NA))
The exact same results are produced in the second snippet if the splines
package is never attached using library
in the first place. I cannot think of another situation in which the use of the ::
operator on an already-loaded package changes program behaviour.
The same behaviour arises using other functions from splines
like the natural spline basis implementation ns
. Interestingly, in both cases the "y hat" or fitted values are reasonable and match one another. The fitted model objects are identical except for names of attributes, as far as I can tell.
I have been unable to pin down the source of this behaviour. While this may read like a bug report, my questions are
- Why does this happen? I have been trying to follow through
predict.lm
but cannot pin down where the divergence occurs. - Is this somehow intended behaviour, and if so where can I learn more about it?