I am confused with the way predict.glm function in R works. According to the help,
The "terms" option returns a matrix giving the fitted values of each term in the model formula on the linear predictor scale.
Thus, if my model has form f(y) = X*beta, then command
predict(model, X, type='terms')
is expected to produce the same matrix X, multiplied by beta element-wise. For example, if I train the following model
test.data = data.frame(y = c(0,0,0,1,1,1,1,1,1), x=c(1,2,3,1,2,2,3,3,3))
model = glm(y~(x==1)+(x==2), family = 'binomial', data = test.data)
the resulting coefficients are
beta <- model$coef
Design matrix is
X <- model.matrix(y~(x==1)+(x==2), data = test.data)
(Intercept) x == 1TRUE x == 2TRUE
1 1 1 0
2 1 0 1
3 1 0 0
4 1 1 0
5 1 0 1
6 1 0 1
7 1 0 0
8 1 0 0
9 1 0 0
Then multiplied by coefficients it should look like
pred1 <- t(beta * t(X))
(Intercept) x == 1TRUE x == 2TRUE
1 1.098612 -1.098612 0.0000000
2 1.098612 0.000000 -0.4054651
3 1.098612 0.000000 0.0000000
4 1.098612 -1.098612 0.0000000
5 1.098612 0.000000 -0.4054651
6 1.098612 0.000000 -0.4054651
7 1.098612 0.000000 0.0000000
8 1.098612 0.000000 0.0000000
9 1.098612 0.000000 0.0000000
However, actual matrix produced by predict.glm
seems to be unrelated to this. The following code
pred2 <- predict(model, test.data, type = 'terms')
x == 1 x == 2
1 -0.8544762 0.1351550
2 0.2441361 -0.2703101
3 0.2441361 0.1351550
4 -0.8544762 0.1351550
5 0.2441361 -0.2703101
6 0.2441361 -0.2703101
7 0.2441361 0.1351550
8 0.2441361 0.1351550
9 0.2441361 0.1351550
attr(,"constant")
[1] 0.7193212
How does one interpret such results?
terms
predict uses different contrasts, but none of the built in seem to work. Also, to confirmall.equal(rowSums(predict(model, test.data, type = 'terms')) + attributes(predict(model, test.data, type = 'terms'))$constant, predict(model, test.data))
– Tove