After struggling with this problem for a while, I am hoping to get some advice here. I am wondering if anyone is aware of an automated method for determining pairwise grouping labels based on significance. The question is independent of the significance test (e.g. Tukey for parametric or Mann-Whitney for non-parametric) - given these pairwise comparisons, some boxplot-type figures often represent these groupings with a sub-script:
I have done this example by hand, which can be quite tedious. I think that the sequence of labeling in the algorithm should be based on the number of levels in each group - e.g. those groups containing single levels that are significantly different from all other levels should be named first, then groups containing 2 levels, then 3, etc., all the while checking that new groupings add a new needed grouping and do not violate and differences.
In the example below, the tricky part is getting the algorithm to recognize that level 1 should be grouped with 3 and 5, but 3 and 5 should not be grouped (i.e. share a label).
Example code:
set.seed(1)
n <- 7
n2 <- 100
mu <- cumsum(runif(n, min=-3, max=3))
sigma <- runif(n, min=1, max=3)
dat <- vector(mode="list", n)
for(i in seq(dat)){
dat[[i]] <- rnorm(n2, mean=mu[i], sd=sigma[i])
}
df <- data.frame(group=as.factor(rep(seq(n), each=n2)), y=unlist(dat))
bp <- boxplot(y ~ group, df, notch=TRUE)
kr <- kruskal.test(y ~ group, df)
kr
mw <- pairwise.wilcox.test(df$y, df$g)
mw
mw$p.value > 0.05 # TRUE means that the levels are not significantly different at the p=0.05 level
# 1 2 3 4 5 6
#2 FALSE NA NA NA NA NA
#3 TRUE FALSE NA NA NA NA
#4 FALSE FALSE FALSE NA NA NA
#5 TRUE FALSE FALSE FALSE NA NA
#6 FALSE FALSE FALSE TRUE FALSE NA
#7 FALSE FALSE FALSE FALSE FALSE FALSE
text(x=1:n, y=bp$stats[4,], labels=c("AB", "C", "A", "D", "B", "D", "E"), col=1, cex=1.5, pos=3, font=2)