Bind residuals to input dataset with missing values [duplicate]
Asked Answered
P

5

5

I am looking for a method to bind lm residuals to an input dataset. The method must add NA for missing residuals and the residuals should correspond to the proper row.

Sample data:

N <- 100 
Nrep <- 5 
X <- runif(N, 0, 10) 
Y <- 6 + 2*X + rnorm(N, 0, 1) 
X[ sample(which(Y < 15), Nrep) ] <- NA
df <- data.frame(X,Y)

residuals(lm(Y ~ X,data=df,na.action=na.omit))

Residuals should be bound to df.

Petuntse answered 2/12, 2012 at 19:6 Comment(1)
Similar questions here and here.Buffer
G
0
"[<-"(df, !is.na(df$X), "res", residuals(lm(Y ~ X,data=df,na.action=na.omit)))

will do the trick.

Grizel answered 2/12, 2012 at 19:44 Comment(2)
Can you explain this? What is "[<-"?Raver
@BrandonBertelsen The function "[<-"(x1, x2, x3, x4) is similar to x1[x2, x3] <- x4 but leaves x1 unchanged and returns a new object.Grizel
B
10

Simply change the na.action to na.exclude:

residuals(lm(Y ~ X, data = df, na.action = na.exclude))

na.omit and na.exclude both do casewise deletion with respect to both predictors and criterions. They only differ in that extractor functions like residuals() or fitted() will pad their output with NAs for the omitted cases with na.exclude, thus having an output of the same length as the input variables.

(this is the best solution found here)

Buffer answered 31/7, 2013 at 18:23 Comment(1)
This is the general solution you're looking for, the one that works with missings in any number of predictors or DV, with lm and lme4.Hypomania
R
2

Using merge, or join.

N <- 100 
Nrep <- 5 
X <- runif(N, 0, 10) 
Y <- 6 + 2*X + rnorm(N, 0, 1) 
X[ sample(which(Y < 15), Nrep) ] <- NA
df <- data.frame(X,Y)

df$id <- rownames(df)

res <- residuals(lm(Y ~ X,data=df,na.action=na.omit))
tmp <- data.frame(res=res)
tmp$id <- names(res)

merge(df,tmp,by="id",sort=FALSE,all.x=TRUE)

If you need to maintain the order. Use join() from the plyr package:

library(plyr) 
join(df,tmp)
Raver answered 2/12, 2012 at 19:19 Comment(2)
couldn't this code be simplified by merging by row names?Buffer
There is much much simpler solution, see my answerBuffer
A
0

This maybe could be solution, but, first, you do not need c() in data.frame

df <- data.frame(X,Y)
df$Res[!is.na(X)]<-residuals(lm(Y ~ X,data=df,na.action=na.omit))
Altricial answered 2/12, 2012 at 19:15 Comment(3)
This duplicates residuals. Rather than appending NARaver
I've removed the c() in data.framePetuntse
What if Y is NA? What if another predictor variables is NA? Not very robust to this, thus probably not a way to go.Buffer
G
0
"[<-"(df, !is.na(df$X), "res", residuals(lm(Y ~ X,data=df,na.action=na.omit)))

will do the trick.

Grizel answered 2/12, 2012 at 19:44 Comment(2)
Can you explain this? What is "[<-"?Raver
@BrandonBertelsen The function "[<-"(x1, x2, x3, x4) is similar to x1[x2, x3] <- x4 but leaves x1 unchanged and returns a new object.Grizel
T
0
N <- 100 
Nrep <- 5 
X <- runif(N, 0, 10) 
Y <- 6 + 2*X + rnorm(N, 0, 1) 
X[ sample(which(Y < 15), Nrep) ] <- NA
df <- data.frame(X,Y)

R.all=as.numeric(rep(NA,length(X)))  # numeric vector with missing values
res=residuals(lm(Y ~ X,data=df,na.action=na.omit))  
i=as.numeric(names(res)) # vector locations of non-missing residuals
R.all[i]=res;R.all     # assign residuals to their correct positions.
Training answered 19/7, 2017 at 18:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.