R: modeling on residuals - McMap

About

R: modeling on residuals

Asked 4/5, 2021 at 12:51 Answered 3/6, 2021 at 13:19

S

1

7

I have heard people talk about "modeling on the residuals" when they want to calculate some effect after an a-priori model has been made. For example, if they know that two variables, var_1 and var_2 are correlated, we first make a model with var_1 and then model the effect of var_2 afterwards. My problem is that I've never seen this done in practice.

I'm interested in the following:

If I'm using a glm, how do I account for the link function used?
What distribution do I choose when running a second glm with var_2 as explanatory variable? I assume this is related to 1.
Is this at all related to using the first models prediction as an offset in the second model?

My attempt:

dt <- data.table(mtcars) # I have a hypothesis that `mpg` is a function of both `cyl` and `wt`
dt[, cyl := as.factor(cyl)]
model <- stats::glm(mpg ~ cyl, family=Gamma(link="log"), data=dt) # I want to model `cyl` first
dt[, pred := stats::predict(model, type="response", newdata=dt)]
dt[, res := mpg - pred]

# will this approach work?
model2_1 <- stats::glm(mpg ~ wt + offset(pred), family=Gamma(link="log"), data=dt)
dt[, pred21 := stats::predict(model2_1, type="response", newdata=dt) ]

# or will this approach work?
model2_2 <- stats::glm(res ~ wt, family=gaussian(), data=dt)
dt[, pred22 := stats::predict(model2_2, type="response", newdata=dt) ]

My first suggested approach has convergence issues, but this is how my silly brain would approach this problem. Thanks for any help!

Serene answered 4/5, 2021 at 12:51 Comment(5)

I'm wondering whether this question is more likely to find an answer on Cross Validated, assuming that you write the post with less focus on the code and more on the validity of the approach. – Unmusical 29/5, 2021 at 12:53

I don't have an answer, but a similar question was asked here. One commenter (on the accepted answer) adds a note on different types of residuals, which is further covered here. I found the answer by Maverick Meerkat particularly useful. – Unmusical 29/5, 2021 at 13:2

@Unmusical yes, maybe you are right, that would be a good idea. I feel like I've invested so many points, though, I'm commited :D – Serene 30/5, 2021 at 10:7

perhaps relevant stats.stackexchange.com/questions/368369/…, besjournals.onlinelibrary.wiley.com/doi/full/10.1046/…, stats.stackexchange.com/questions/244870/… – Interlingua 3/6, 2021 at 12:20

@Interlingua thank you for the tips, much appreciated! I think I have to make a crossvalidated post :) – Serene 3/6, 2021 at 12:54

I

0

In a sense, an ANCOVA is 'modeling on the residuals'. The model for ANCOVA is y_i = grand_mean + treatment_i + b * (covariate - covariate_mean_i) + error for each treatment i. The term (covariate - covariate_mean_i) can be seen as the residuals of a model with covariate as DV and treatment as IV.

The following regression is equivalent to this ANCOVA:

lm(y ~ treatment * scale(covariate, scale = FALSE))

Which applied to the data would look like this:

lm(mpg ~ factor(cyl) * scale(wt, scale = FALSE), data = mtcars)

And can be turned into a glm similar to the one you use in your example:

glm(mpg ~ factor(cyl) * scale(wt, scale = FALSE), 
    family=Gamma(link="log"), 
    data = mtcars)

Interrupted answered 3/6, 2021 at 13:19 Comment(0)

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2025 — McMap. All rights reserved.