As an initial note, unless it's been changed, to get the correct results for type iii sum of squares, you need to set the contrast coding for the factor variables. This can be done inside the lm
call or with options
. The example below uses options
.
I would be cautious about using HSD.test
and similar functions with unbalanced designs unless the documentation addresses their use in these situations. The documentation for TukeyHSD
mentions that it adjusts for "mildly unbalanced" designs. I don't know if HSD.test
handles things differently. You'd have to check additional documentation for the package or the original reference cited for the function.
As a side note, enclosing the whole HSD.test
function in parentheses will cause it to print the results. See example below.
In general, I would recommend using the flexible emmeans
(née lsmeans
) or multcomp
packages for all your post-hoc comparison needs. emmeans
is particularly useful for doing mean separations on interactions or for examining contrasts among treatments. [EDIT: Caveat that I am the author of these pages.]
With an unbalanced design, you may want to report the E.M. (or L.S.) means instead of the arithmetic means. See SAEPER: What are least square means?. [EDIT: Caveat that I am the author of this page.] Note in the example below that the marginal means reported by emmeans
are different than those reported by HSD.test
.
Also note that the "Tukey" in glht
has nothing to do with Tukey HSD or Tukey-adjusted comparisons; it just sets up the contrasts for all pairwise tests, as the output says.
However, the adjust="tukey"
in emmeans
functions does mean to use Tukey-adjusted comparisons, as the output says.
The following example is partially adapted from ARCHBS: One-way Anova.
### EDIT: Some code changed to reflect changes to some functions
### in the emmeans package
if(!require(car)){install.packages("car")}
library(car)
data(mtcars)
mtcars$cyl.f = factor(mtcars$cyl)
mtcars$carb.f = factor(mtcars$carb)
options(contrasts = c("contr.sum", "contr.poly"))
model = lm(mpg ~ cyl.f + carb.f, data=mtcars)
library(car)
Anova(model, type="III")
if(!require(agricolae)){install.packages("agricolae")}
library(agricolae)
(HSD.test(model, "cyl")$groups)
if(!require(emmeans)){install.packages("emmeans")}
library(emmeans)
marginal = emmeans(model,
~ cyl.f)
pairs(marginal, adjust="tukey")
if(!require(multcomp)){install.packages("multcomp")}
library(multcomp)
cld(marginal, adjust="tukey", Letters=letters)
if(!require(multcomp)){install.packages("multcomp")}
library(multcomp)
mc = glht(model,
mcp(cyl.f = "Tukey"))
summary(mc, test=adjusted("single-step"))
cld(mc)
HSD.test(mod, group=TRUE, main= "SN Average by ethnicity & gender")
but I'm still getting an error:Error in as.character(x) : 'x' is missing
. Looking at the output, though, it doesn't seem to match the reporting of p-values that you get from TukeyHSD. I'll keep trying and see if I can find out what's going wrong. Thanks! – Eggplant