How to add a column of fitted values to a data frame by group?
Asked Answered
B

3

1

Say I have a data frame like this:

X <- data_frame(
  x = rep(seq(from = 1, to = 10, by = 1), 3),
  y = 2*x + rnorm(length(x), sd = 0.5),
  g = rep(LETTERS[1:3], each = length(x)/3))

How can I fit a regression y~x grouped by variable g and add the values from the fitted and resid generic methods to the data frame?

I know I can do:

A <- X[X$g == "A",]
mA <- with(A, lm(y ~ x))
A$fit <- fitted(mA)
A$res <- resid(mA)

B <- X[X$g == "B",]
mB <- with(B, lm(y ~ x))
B$fit <- fitted(mB)
B$res <- resid(mB)

C <- X[X$g == "C",]
mC <- with(B, lm(y ~ x))
C$fit <- fitted(mC)
C$res <- resid(mC)

And then rbind(A, B, C). However, in real life I am not using lm (I'm using rqss in the quantreg package). The method occasionally fails, so I need error handling, where I'd like to place NA all the rows that failed. Also, there are way more than 3 groups, so I don't want to just keep copying and pasting code for each group.

I tried using dplyr with do but didn't make any progress. I was thinking it might be something like:

make_qfits <- function(data) {
  data %>%
    group_by(g) %>%
    do(failwith(NULL, rqss), formula = y ~ qss(x, lambda = 3))
}

Would this be easy to do by that approach? Is there another way in base R?

Beaux answered 30/7, 2015 at 21:51 Comment(2)
The augment.columns function from package broom comes to mind. What did you try in do that wasn't working?Doolittle
@Doolittle I was thinking it might be something like the code I just added.Beaux
C
1

For the lm models you could try

library(nlme)     # lmList to do lm by group
library(ggplot2)  # fortify to get out the fitted/resid data
do.call(rbind, lapply(lmList(y ~ x | g, data=X), fortify))

This gives you the residual and fitted data in ".resid" and ".fitted" columns as well as a bunch of other fit data. By default the rownames will be prefixed with the letters from g.

With the rqss models that might fail

do.call(rbind, lapply(split(X, X$g), function(z) {
    fit <- tryCatch({
        rqss(y ~ x, data=z)
    }, error=function(e) NULL)
    if (is.null(fit)) data.frame(resid=numeric(0), fitted=numeric(0))
    else data.frame(resid=fit$resid, fitted=fitted(fit))
}))
Cassy answered 30/7, 2015 at 21:57 Comment(5)
Thanks, but doesn't that force me to use lme behind the scenes? I'm trying to make non-parametric fits with rqss since my real data is nonlinear.Beaux
@Beaux ok, the fortify wasn't working with rqss so I added a split-apply-combine solutionCassy
@nongkrog Thanks! That looks like something I can work with. But the fit occasionally fails with idObject(.Object) : invalid class "matrix.csr" object: ra and ja don't have equal lengths. How would you do error handling if rqss fails?Beaux
@Beaux i would add in a tryCatch clause, try the edit, I don't have failing data so Im not sureCassy
Hmm, I tried binding the columns together and saw that the number of rows were different. bind_cols(X, Y) > Error: incompatible number of rows (20553, expecting 21561)`Beaux
D
2

You can use do on grouped data for this task, fitting the model in each group in do and putting the model residuals and fitted values into a data.frame. To add these to the original data, just include the . that represents the data going into do in the output data.frame.

In your simple case, this would look like this:

X %>%
    group_by(g) %>%
    do({model = rqss(y ~ qss(x, lambda = 3), data = .)
        data.frame(., residuals = resid.rqss(model), fitted = fitted(model))
            })

Source: local data frame [30 x 5]
Groups: g

    x         y g     residuals    fitted
1   1  1.509760 A -1.368963e-08  1.509760
2   2  3.576973 A -8.915993e-02  3.666133
3   3  6.239950 A  4.174453e-01  5.822505
4   4  7.978878 A  4.130033e-09  7.978878
5   5 10.588367 A  4.833475e-01 10.105020
6   6 11.786445 A -3.807876e-01 12.167232
7   7 14.646221 A  4.167763e-01 14.229445
8   8 15.938253 A -3.534045e-01 16.291658
9   9 19.114927 A  7.610560e-01 18.353871
10 10 19.574449 A -8.416343e-01 20.416083
.. ..       ... .           ...       ...

Things will look more complicated if you need to catch errors. Here is what it would look like using try and filling the residuals and fitted columns with NA if fit attempt for the group results in an error.

X[9:30,] %>%
    group_by(g) %>%
    do({catch = try(rqss(y ~ qss(x, lambda = 3), data = .))
    if(class(catch) == "try-error"){
        data.frame(., residuals = NA, fitted = NA)
    }
    else{
        model = rqss(y ~ qss(x, lambda = 3), data = .)
        data.frame(., residuals = resid.rqss(model), fitted = fitted(model))
        }
    })
Source: local data frame [22 x 5]
Groups: g

    x         y g     residuals    fitted
1   9 19.114927 A            NA        NA
2  10 19.574449 A            NA        NA
3   1  2.026199 B -4.618675e-01  2.488066
4   2  4.399768 B  1.520739e-11  4.399768
5   3  6.167690 B -1.437800e-01  6.311470
6   4  8.642481 B  4.193089e-01  8.223172
7   5 10.255790 B  1.209160e-01 10.134874
8   6 12.875674 B  8.290981e-01 12.046576
9   7 13.958278 B -4.803891e-10 13.958278
10  8 15.691032 B -1.789479e-01 15.869980
.. ..       ... .           ...       ...
Doolittle answered 31/7, 2015 at 19:15 Comment(0)
C
1

For the lm models you could try

library(nlme)     # lmList to do lm by group
library(ggplot2)  # fortify to get out the fitted/resid data
do.call(rbind, lapply(lmList(y ~ x | g, data=X), fortify))

This gives you the residual and fitted data in ".resid" and ".fitted" columns as well as a bunch of other fit data. By default the rownames will be prefixed with the letters from g.

With the rqss models that might fail

do.call(rbind, lapply(split(X, X$g), function(z) {
    fit <- tryCatch({
        rqss(y ~ x, data=z)
    }, error=function(e) NULL)
    if (is.null(fit)) data.frame(resid=numeric(0), fitted=numeric(0))
    else data.frame(resid=fit$resid, fitted=fitted(fit))
}))
Cassy answered 30/7, 2015 at 21:57 Comment(5)
Thanks, but doesn't that force me to use lme behind the scenes? I'm trying to make non-parametric fits with rqss since my real data is nonlinear.Beaux
@Beaux ok, the fortify wasn't working with rqss so I added a split-apply-combine solutionCassy
@nongkrog Thanks! That looks like something I can work with. But the fit occasionally fails with idObject(.Object) : invalid class "matrix.csr" object: ra and ja don't have equal lengths. How would you do error handling if rqss fails?Beaux
@Beaux i would add in a tryCatch clause, try the edit, I don't have failing data so Im not sureCassy
Hmm, I tried binding the columns together and saw that the number of rows were different. bind_cols(X, Y) > Error: incompatible number of rows (20553, expecting 21561)`Beaux
M
0

Here's a version that works with base R:

modelit <- function(df) {
    mB <- with(df, lm(y ~ x, na.action = na.exclude))
    df$fit <- fitted(mB)
    df$res <- resid(mB)
    return(df)
}

dfs.with.preds <- lapply(split(X, as.factor(X$g)), modelit)
output <- Reduce(function(x, y) { rbind(x, y) }, dfs.with.preds)
Mcatee answered 30/7, 2015 at 22:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.