Nested ANOVA unique factor levels
Asked Answered
S

1

7

I'm running a nested ANOVA with the following setup: 2 areas, one is reference, one is exposure (column named CI = Control/Impact). Two time periods (before and after impact, column named BA), with 1 year in the before period and 3 years in the after period. The years are nested.

My question is: if I use the original years (in column Time2 in the toy dataset), I get one result. If I rename the years, so that they're just 1 for Before and 1-3 for After, I get a different result.

Questions:

  1. Since the years were unique, should the nesting not account for the correct data structure?
  2. My results are identical for the two models using type 1 or type 2 SS. Why do they not differ between types of SS? I would have expected BA (but not CI) to change.
  3. Results for Type 3 SS using the unique names are the same as for types 1 and 2 SS. But using the renamed values, I get a different result. What is going on?

toy dataset:

toy <- structure(list(BA = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), .Label = c("A", "B"), class = "factor"), Time = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", "2", "3"), class = "factor"), 
Time2 = structure(c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L
), .Label = c("11", "12", "13", "15", "16", "17"), class = "factor"), 
Lake = c("Area 1", "Area 1", "Area 1", "Area 1", "Area 1", 
"Area 2", "Area 2", "Area 2", "Area 2", "Area 2", "Area 1", 
"Area 1", "Area 1", "Area 1", "Area 1", "Area 2", "Area 2", 
"Area 2", "Area 2", "Area 2", "Area 1", "Area 1", "Area 1", 
"Area 1", "Area 1", "Area 2", "Area 2", "Area 2", "Area 2", 
"Area 2", "Area 1", "Area 1", "Area 1", "Area 1", "Area 1", 
"Area 2", "Area 2", "Area 2", "Area 2", "Area 2"), CI = structure(c(2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("C", "I"), class = "factor"), 
Response = c(78.3, 75.3, 69.4, 75.1, 71.1, 49.7, 61, 59.6, 
35.3, 26.5, 80.9, 81.4, 67.6, 73.6, 73, 46.4, 73.6, 67.1, 
34, 45.5, 86.6, 78, 68.2, 76.8, 69.6, 52.1, 61.9, 50.8, 39.2, 
49.6, 72, 74, 71, 68, 58, 40, 41, 34, 54, 61)), .Names = c("BA", 
"Time", "Time2", "Lake", "CI", "Response"), row.names = c(NA, 
40L), class = "data.frame")

analysis using type 1 SS:

mod <- lm(Response ~ BA + CI + BA*CI + BA/Time + BA/Time*CI, data = toy)
mod1 <- lm(Response ~ BA + CI + BA*CI + BA/Time2 + BA/Time2*CI, data = toy)
# results are the same
anova(mod)
anova(mod1)

now try with type 2

library(car)
options(contrasts=c("contr.sum", "contr.poly"))
mod <- lm(Response ~ BA + CI + BA*CI + BA/Time + BA/Time*CI, data = toy)
mod1 <- lm(Response ~ BA + CI + BA*CI + BA/Time2 + BA/Time2*CI, data = toy)
Anova(mod, type = "II", singular.ok = TRUE)
Anova(mod1, type = "II", singular.ok = TRUE)

and type 3

Anova(mod, type = "III", singular.ok = TRUE)
Anova(mod1, type = "III", singular.ok = TRUE)
Scrap answered 24/11, 2017 at 13:12 Comment(1)
If you don't get an answer here, consider taking your question to crossvalidated.com (a sister site of Stack Overflow). It seems like you have enough statistical content to make this a useful question over there.Alimentary
A
1

Warning at the outset, bits of this probably aren't exactly right - I'm admittedly rusty on ANOVA at this point.

In a Type-III SS analysis we say that the main effects are qualified by the interaction. In short, that means that in the presence of higher order interactions, lower order interactions and main effects are less interpretable. Tradition pushes us towards Type-III analyses, but they really are a pain for this reason. Anyway...

Let's take a quick peek at your recode. The effect of your recode was that Time2 went from having four distinct values to having three distinct values in Time. You may hold onto some interpretability because the combination of BA at level B is unique to the time values that were previously 13, but are now 1.

Let's go back to your data. Now, BA:Time2 carries all of the same information as BA:CI. How does this look in your Time2 ANOVA results...

> Anova(mod1, type = "III", singular.ok = TRUE)
Anova Table (Type III tests)

Response: Response
            Sum Sq Df  F value    Pr(>F)    
(Intercept)  73945  1 712.5963 < 2.2e-16 ***
BA             246  1   2.3678    0.1337    
CI            2484  1  23.9401 2.713e-05 ***
BA:CI            0  1   0.0046    0.9462    
BA:Time2        95  2   0.4570    0.6372    
BA:CI:Time2     37  2   0.1797    0.8364    
Residuals     3321 32                 

... SS for BA:CI is, as expected (more or less) 0. Contrast this with the Time model...

> Anova(mod, type = "III", singular.ok = TRUE)
Anova Table (Type III tests)

Response: Response
            Sum Sq Df   F value    Pr(>F)    
(Intercept) 107772  1 1038.5835 < 2.2e-16 ***
BA             209  1    2.0099    0.1659    
CI            4220  1   40.6655 3.661e-07 ***
BA:CI            9  1    0.0907    0.7653    
BA:Time         95  2    0.4570    0.6372    
BA:CI:Time      37  2    0.1797    0.8364    
Residuals     3321 32                        

BA:CI got to hold onto a bit of variance... and the rest of the model seemed to have done better as well.

My sense is that you have a poorly specified data set for ANOVA under all coding schemes. Level B of AB is confounded with one level of your time group under both ways of coding things. Also take special note of the comments in package:car for Anova, particularly with regard to the singular.ok arguement. In short it says 'defaults to TRUE for type-II tests, and FALSE for type-III tests (where the tests for models with aliased coefficients will not be straightforwardly interpretable)' ... and here you are with seemingly aliased coefficients.

... nevermind that Fox's description of what he has done with Type II and Type III tests is... challenging to interpret. This is a reminder for me as to why I always used package:ez for ezANOVA() back in the day when I had to tolerate non-type I testing.

Alimentary answered 29/11, 2017 at 5:21 Comment(3)
Any thoughts on why the results are not different between Type I and Type II? And I'm still not sure I understand why the nesting doesn't remain the same - this whole time I've been under the impression that there's no difference between unique (=no repeats in dataset) and non-unique (=repeats between nesting, but not nested, levels) nested identifiers.Scrap
For Type I, I suspect it is because order of hierarchy / order of entry doesn't matter for variance partitioning. For Type II, I'm less certain / I get confused by Fox's description of II vs III and I never worked with traditionally defined Type II SS in the first place.Alimentary
I meant here - why are Type I and Type II results identical? I would expect Type II p values to change for the BA term (since it would be accounting for CI in the Type II setup)Scrap

© 2022 - 2024 — McMap. All rights reserved.