Custom contrasts are very widely used in analyses, e.g.: "Do DV values at level 1 and level 3 of this three-level factor differ significantly?"
Intuitively, this contrast is expressed in terms of cell means as:
c(1,0,-1)
One or more of these contrasts, bound as columns, form a contrast coefficient matrix, e.g.
mat = matrix(ncol = 2, byrow = TRUE, data = c(
1, 0,
0, 1,
-1, -1)
)
[,1] [,2]
[1,] 1 0
[2,] 0 1
[3,] -1 -1
However, when it comes to running these contrasts specified by the coefficient matrix, there is a lot of (apparently contradictory) information on the web and in books. My question is which information is correct?
Claim 1: contrasts(factor) takes a coefficient matrix
In some examples, the user is shown that the intuitive contrast coefficient matrix can be used directly via the contrasts()
or C()
functions. So it's as simple as:
contrasts(myFactor) <- mat
Claim 2: Transform coefficients to create a coding scheme
Elsewhere (e.g. UCLA stats) we are told the coefficient matrix (or basis matrix) must be transformed from a coefficient matrix into a contrast matrix before use. This involves taking the inverse of the transform of the coefficient matrix: (mat')⁻¹
, or, in Rish:
contrasts(myFactor) = solve(t(mat))
This method requires padding the matrix with an initial column of means for the intercept. To avoid this, some sites recommend using a generalized inverse function which can cope with non-square matrices, i.e., MASS::ginv()
contrasts(myFactor) = ginv(t(mat))
Third option: premultiply by the transform, take the inverse, and post multiply by the transform
Elsewhere again (e.g. a note from SPSS support), we learn the correct algebra is: (mat'mat)-¹ mat'
Implying to me that the correct way to create the contrasts matrix should be:
x = solve(t(mat)%*% mat)%*% t(mat)
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 1 0 -1
[3,] 0 1 -1
contrasts(myFactor) = x
My question is, which is right? (If I am interpreting and describing each piece of advice accurately). How does one specify custom contrasts in R for lm
, lme
etc?
Refs