Custom contrasts in R: contrast coefficient matrix or contrast matrix / coding scheme? And how to get there?

Asked 4/8, 2015 at 19:57 Answered 2/1, 2019 at 17:44

Custom contrasts are very widely used in analyses, e.g.: "Do DV values at level 1 and level 3 of this three-level factor differ significantly?"

Intuitively, this contrast is expressed in terms of cell means as:

c(1,0,-1)

One or more of these contrasts, bound as columns, form a contrast coefficient matrix, e.g.

mat = matrix(ncol = 2, byrow = TRUE, data = c(
    1,  0,
    0,  1,
   -1, -1)
)
     [,1] [,2]
[1,]    1    0
[2,]    0    1
[3,]   -1   -1

However, when it comes to running these contrasts specified by the coefficient matrix, there is a lot of (apparently contradictory) information on the web and in books. My question is which information is correct?

Claim 1: contrasts(factor) takes a coefficient matrix

In some examples, the user is shown that the intuitive contrast coefficient matrix can be used directly via the contrasts() or C() functions. So it's as simple as:

contrasts(myFactor) <- mat

Claim 2: Transform coefficients to create a coding scheme

Elsewhere (e.g. UCLA stats) we are told the coefficient matrix (or basis matrix) must be transformed from a coefficient matrix into a contrast matrix before use. This involves taking the inverse of the transform of the coefficient matrix: (mat')⁻¹, or, in Rish:

contrasts(myFactor) = solve(t(mat))

This method requires padding the matrix with an initial column of means for the intercept. To avoid this, some sites recommend using a generalized inverse function which can cope with non-square matrices, i.e., MASS::ginv()

contrasts(myFactor) = ginv(t(mat))

Third option: premultiply by the transform, take the inverse, and post multiply by the transform

Elsewhere again (e.g. a note from SPSS support), we learn the correct algebra is: (mat'mat)-¹ mat'

Implying to me that the correct way to create the contrasts matrix should be:

x = solve(t(mat)%*% mat)%*% t(mat)
     [,1] [,2] [,3]
[1,]    0    0    1
[2,]    1    0   -1
[3,]    0    1   -1

contrasts(myFactor) = x

My question is, which is right? (If I am interpreting and describing each piece of advice accurately). How does one specify custom contrasts in R for lm, lme etc?

Refs

Ernestoernestus answered 4/8, 2015 at 19:57 Comment(1)

it should be matrix(ncol=2, ...) rather than matrix(col=2, ...) – Estonian 9/11, 2016 at 11:54

Claim 2 is correct (see the answers here and here) and sometimes claim 1, too. This is because there are cases in which the generalized inverse of the (transposed) coefficient matrix is equal to the matrix itself.

Rubicund answered 2/1, 2019 at 17:44 Comment(0)

For what it's worth....

If you have a factor with 3 levels (levels A, B, and C) and you want to test the following orthogonal contrasts: A vs B, and the avg. of A and B vs C, your contrast codes would be:

Cont1<- c(1,-1, 0)
Cont2<- c(.5,.5, -1)

If you do as directed on the UCLA site (transform coefficients to make a coding scheme), as such:

Contrasts(Variable)<- solve(t(cbind(c(1,1,1), Cont1, Cont2)))[,2:3]

then your results are IDENTICAL to if you had created two dummy variables (e.g.:

Dummy1<- ifelse(Variable=="A", 1, ifelse(Variable=="B", -1, 0))
Dummy2<- ifelse(Variable=="A", .5, ifelse(Variable=="B", .5, -1))

and entered them both into the regression equation instead of your factor, which makes me inclined to think that this is the correct way.

PS I don't write the most elegant R code, but it gets the job done. Sorry, I'm sure there are easier ways to recode variables, but you get the gist.

Babb answered 9/6, 2016 at 21:9 Comment(0)

I'm probably missing something, but in each of your three examples, you specify the contrast matrix in the same way, i.e.

## Note it should plural of contrast
contrasts(myFactor) = x

The only thing that differs is the value of x.

Using the data from the UCLA website as an example

hsb2 = read.table('http://www.ats.ucla.edu/stat/data/hsb2.csv', header=T, sep=",")

#creating the factor variable race.f
hsb2$race.f = factor(hsb2$race, labels=c("Hispanic", "Asian", "African-Am", "Caucasian"))

We can specify either the treatment version of the contrasts

contrasts(hsb2$race.f) = contr.treatment(4)
summary(lm(write ~ race.f, hsb2))

or the sum version

contrasts(hsb2$race.f) = contr.sum(4)
summary(lm(write ~ race.f, hsb2))

Alternatively, we can specify a bespoke contrast matrix.

See ?contr.sum for other standard contrasts.

Vinegarish answered 4/8, 2015 at 22:1 Comment(4)

Thanks @csgillespie. Sorry if not clear: the question is how to specify custom contrast matrices (not how to get the built-in contrasts). So in terms of your answer, the question is "there's contradictory advice about specifying a bespoke contrast matrix - which is right?" – Ernestoernestus 4/8, 2015 at 22:6

But in each of your three examples, you get a bespoke matrix m, then use contrasts(...) = m to set. – Vinegarish 4/8, 2015 at 22:8

the three examples give different results: 1 sticks the bespoke coefficient matrix into contrasts(myFactor)<-m, the next inserts solve(t(m)) and the final one inserts x = solve(t(m)%*% m)%*% t(m). Are you saying solution 1 is correct, and one simply sets contrasts() to the coefficient matrix? – Ernestoernestus 4/8, 2015 at 22:11

I see what you mean. I'll have to think about this a bit more. I'll delete my answer in the meantime, since you'll be more likely to get another answer – Vinegarish 4/8, 2015 at 22:29

Claim 1: contrasts(factor) takes a coefficient matrix

Claim 2: Transform coefficients to create a coding scheme

Third option: premultiply by the transform, take the inverse, and post multiply by the transform

Recommended topics

Hot tags