Difference between the interaction : and * term for formulas in StatsModels OLS regression
Asked Answered
G

2

13

Hi I'm learning Statsmodel and can't figure out the difference between : and * (interaction terms) for formulas in StatsModels OLS regression. Could you please give me a hint to figure this out?

Thank you!

The documentation: http://statsmodels.sourceforge.net/devel/example_formulas.html

Grundy answered 10/10, 2015 at 3:58 Comment(1)
The most complete explanation is in the patsy documentation patsy.readthedocs.org/en/latest/formulas.html which is used by statsmodels. This #23672966 also has some explanation for the difference between : and *.Ekg
B
28

":" will give a regression without the level itself. just the interaction you have mentioned.

"*" will give a regression with the level itself + the interaction you have mentioned.

for example

a. GLMmodel = glm("y ~ a: b" , data = df)

you'll have only one independent variable which is the results of "a" multiply by "b"

b. GLMmodel = glm("y ~ a * b" , data = df)

you'll have 3 independent variables which is the results of "a" multiply by "b" + "a" itself + "b" itself

Brittenybrittingham answered 22/2, 2016 at 13:2 Comment(0)
R
6

Using A*B is really just shorthand for A + B + A:B

A:B specifies the interaction itself. This is literally the product of the two variables. As such, it rarely makes sense to fit a model with only this term, so we almost always fit the main effects, A and B too (see here for reasons why). Since this is so common, the shorthand notation A*B for this is quite common in many statistical software packages/platforms.

Resentment answered 9/10, 2021 at 9:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.