Multiple Regression with Interaction
Asked Answered
B

1

5

I've come across somewhat of a confusing topic relating to the syntax of multiple regression with explanatory variables and their interactions. A DataCamp explanation led me to think that:

lm(formula = y ~ r + r:s , data)

...is the same as:

lm(formula = y ~ r + s + r:s , data)

Which is incorrect. I have found that the latter is in fact the same as the shortened version:

lm(formula = y ~ r * s , data)

But the former is certainly different.

What exactly is the difference between these - that is, what does the first model show that the latter two wouldn't?

Thank you.

Barbet answered 17/2, 2022 at 22:11 Comment(0)
R
10

Simple Regression:

It is a subtle difference, but there is certainly a difference there. One way you can easily visualize the differences is by using the summary command. I will use the iris dataset since its already in R. First, a simple linear regression:

# Simple regression:
summary(lm(formula = Sepal.Width ~ Sepal.Length,
           data = iris))

This will just show the one independent variable, Sepal.Length, on the dependent variable, Sepal.Width:

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.41895    0.25356   13.48   <2e-16 ***
Sepal.Length -0.06188    0.04297   -1.44    0.152  

Interaction and Main Effects

For the next equation with just the * input:

# Interaction and main effects:
summary(lm(formula = Sepal.Width ~ Sepal.Length*Petal.Length,
           data = iris))

It gives us both the main effects of each independent variable/predictor, while also giving us the interaction between the two. You can see them all listed under coefficients now:

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                1.51011    0.64336   2.347 0.020257 *  
Sepal.Length               0.46940    0.12954   3.624 0.000400 ***
Petal.Length              -0.42907    0.11832  -3.626 0.000397 ***
Sepal.Length:Petal.Length  0.01795    0.02186   0.821 0.413063  

Only Interaction

For the : input, it gives us only the interaction and nothing else:

# Only interaction:
summary(lm(formula = Sepal.Width ~ Sepal.Length:Petal.Length,
           data = iris))

Which you can see below:

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                3.31473    0.06852  48.375  < 2e-16 ***
Sepal.Length:Petal.Length -0.01108    0.00257  -4.312 2.93e-05 ***

Manually Adding Both Interactions and Effects

Finally, if you are entering interactions AND manually adding main effects, you would simply use the : input again, but then use + to add a main effect:

# Only interaction and one main effect:
summary(lm(formula = Sepal.Width ~ Sepal.Length + Sepal.Length:Petal.Length,
           data = iris))

As seen below:

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)               -0.299034   0.422673  -0.707     0.48    
Sepal.Length               0.807410   0.093603   8.626 9.44e-15 ***
Sepal.Length:Petal.Length -0.058626   0.005899  -9.939  < 2e-16 ***

Notice when I do the same call of using + and * now, it still just gives both the interaction and main effects without specifying.

summary(lm(formula = Sepal.Width ~ Sepal.Length + Sepal.Length*Petal.Length,
           data = iris))

In a sense it actually ignores the plus sign:

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                1.51011    0.64336   2.347 0.020257 *  
Sepal.Length               0.46940    0.12954   3.624 0.000400 ***
Petal.Length              -0.42907    0.11832  -3.626 0.000397 ***
Sepal.Length:Petal.Length  0.01795    0.02186   0.821 0.413063
Rancor answered 17/2, 2022 at 22:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.