Calculate and compare coefficient estimates from a regression interaction for each group
Asked Answered
T

1

1

A) I am interested in the effects of a continuous variable (Var1) on a continuous dependent variable (DV) conditional on four different groups, which are defined by two bivariate variables (Dummy1 and Dummy2). I thus run a three-way interaction.

Var1 <- sample(0:10, 100, replace = T)
Dummy1 <- sample(c(0,1), 100, replace = T)
Dummy2 <- sample(c(0,1), 100, replace = T)

DV <-2*Var1 + Var1*Dummy1 + 2*Var1*Dummy2 + 10*Var1*Dummy1*Dummy2 + rnorm(100)

fit <- lm(DV ~ Var1*Dummy1*Dummy2)

I would like to compare coefficients of Var1 between the groups. I believe, this can be achieved by adding up the relevant coefficients.

# Group Dummy1 = 0 & Dummy 2 = 0: 
fit$coefficients[Var1]

# Group Dummy1 = 1 & Dummy 2 = 0: 
fit$coefficients[Var1] + fit$coefficients[Var1:Dummy1]

Yet this seems overly arduous and prone to error. What is a more efficient solution?

My desired output is the estimated effect of Var1 for each possible combination of Dummy1 and Dummy2.

B) Once I know the estimated effect-sizes of Var1 for each group, how can I test if any two are statistically different from each other? I assume the linearHypothesis() function could help, but I can't figure out how. Thanks!

Trilley answered 24/5, 2016 at 15:53 Comment(1)
you need to do fit$coefficients['Var1'] with the quotes.Tarnation
U
5

A fully interacted model is equivalent to running a regression on each subset of the data, so if your intention is indeed:

My desired output is the estimated effect of Var1 for each possible combination of Dummy1 and Dummy2.

Then the following may be helpful:

# get your data
set.seed(42)
Var1 <- sample(0:10, 100, replace = T)
Dummy1 <- sample(c(0,1), 100, replace = T)
Dummy2 <- sample(c(0,1), 100, replace = T)
DV <-2*Var1 + Var1*Dummy1 + 2*Var1*Dummy2 + 10*Var1*Dummy1*Dummy2 + rnorm(100)
df <- data.frame(DV, Var1, Dummy1, Dummy2)

First, note that

fit <- lm(DV ~ Var1*Dummy1*Dummy2)
fit$coefficients["Var1"]
    Var1 
2.049678 
fit$coefficients["Var1"] + fit$coefficients["Var1:Dummy1"]
    Var1 
2.993598 

Now, let us estimate the effects for each group combination:

library(dplyr)
library(broom)

df %>% group_by(Dummy1, Dummy2) %>% do(tidy(lm(DV ~ Var1, data=.)))

Source: local data frame [8 x 7]
Groups: Dummy1, Dummy2 [4]

  Dummy1 Dummy2        term    estimate  std.error    statistic      p.value
   (dbl)  (dbl)       (chr)       (dbl)      (dbl)        (dbl)        (dbl)
1      0      0 (Intercept) -0.03125589 0.33880599  -0.09225307 9.272958e-01
2      0      0        Var1  2.04967796 0.05534155  37.03687553 5.222878e-22
3      0      1 (Intercept) -0.08877431 0.38932340  -0.22802203 8.223492e-01
4      0      1        Var1  3.97771680 0.07046498  56.44955828 8.756108e-21
5      1      0 (Intercept)  0.02582533 0.28189331   0.09161384 9.275272e-01
6      1      0        Var1  2.99359832 0.04622495  64.76153226 4.902771e-38
7      1      1 (Intercept)  0.16562985 0.55143596   0.30036100 7.675439e-01
8      1      1        Var1 14.95581348 0.07582089 197.25189807 5.275462e-30

The intercept here corresponds to the means in each group spanned by the two dummy variables (as opposed to the difference of that mean to the overall mean which you get from the fully interacted regression model), and Var1 corresponds to the slope coefficient in each group, which is the estimated effect of Var1 for each possible combination of Dummy1 and Dummy2.

Note the one-to-one correspondence of the coefficient of Var1 in fit, and the coefficient estimated in row 2, as well as that the value of Var1 in the row 6 corresponds to the value Var1 + Var1:Dummy1. So, you can see that using this approach, you do not need to manually add up variables.

To test if the slope coefficient is identical across all groups, your initial regression model is best suited. You simply check summary(fit) and see if the interaction terms are significant. If they are, there is a difference. If they are not, there is no difference. This would correspond to a sequential test. To carry out a simultaneous test, you can use an F test, as in

library(car)
linearHypothesis(fit, c("Var1:Dummy1", "Var1:Dummy2", "Var1:Dummy1:Dummy2"), 
verbose=T, test="F")
Unnamed answered 24/5, 2016 at 16:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.