Bind residuals to input dataset with missing values [duplicate]

P

5

I am looking for a method to bind lm residuals to an input dataset. The method must add NA for missing residuals and the residuals should correspond to the proper row.

Sample data:

N <- 100 
Nrep <- 5 
X <- runif(N, 0, 10) 
Y <- 6 + 2*X + rnorm(N, 0, 1) 
X[ sample(which(Y < 15), Nrep) ] <- NA
df <- data.frame(X,Y)

residuals(lm(Y ~ X,data=df,na.action=na.omit))

Residuals should be bound to df.

Petuntse answered 2/12, 2012 at 19:6 Comment(1)

Similar questions here and here. – Buffer 31/7, 2013 at 18:26

G

0

"[<-"(df, !is.na(df$X), "res", residuals(lm(Y ~ X,data=df,na.action=na.omit)))

will do the trick.

Grizel answered 2/12, 2012 at 19:44 Comment(2)

Can you explain this? What is "[<-"? – Raver 2/12, 2012 at 20:8

@BrandonBertelsen The function "[<-"(x1, x2, x3, x4) is similar to x1[x2, x3] <- x4 but leaves x1 unchanged and returns a new object. – Grizel 3/12, 2012 at 7:2

B

10

Simply change the na.action to na.exclude:

residuals(lm(Y ~ X, data = df, na.action = na.exclude))

na.omit and na.exclude both do casewise deletion with respect to both predictors and criterions. They only differ in that extractor functions like residuals() or fitted() will pad their output with NAs for the omitted cases with na.exclude, thus having an output of the same length as the input variables.

(this is the best solution found here)

Buffer answered 31/7, 2013 at 18:23 Comment(1)

This is the general solution you're looking for, the one that works with missings in any number of predictors or DV, with lm and lme4. – Hypomania 8/12, 2014 at 11:40

R

2

Using merge, or join.

N <- 100 
Nrep <- 5 
X <- runif(N, 0, 10) 
Y <- 6 + 2*X + rnorm(N, 0, 1) 
X[ sample(which(Y < 15), Nrep) ] <- NA
df <- data.frame(X,Y)

df$id <- rownames(df)

res <- residuals(lm(Y ~ X,data=df,na.action=na.omit))
tmp <- data.frame(res=res)
tmp$id <- names(res)

merge(df,tmp,by="id",sort=FALSE,all.x=TRUE)

If you need to maintain the order. Use join() from the plyr package:

library(plyr) 
join(df,tmp)

Raver answered 2/12, 2012 at 19:19 Comment(2)

couldn't this code be simplified by merging by row names? – Buffer 31/7, 2013 at 18:8

There is much much simpler solution, see my answer – Buffer 31/7, 2013 at 18:24

A

0

This maybe could be solution, but, first, you do not need c() in data.frame

df <- data.frame(X,Y)
df$Res[!is.na(X)]<-residuals(lm(Y ~ X,data=df,na.action=na.omit))

Altricial answered 2/12, 2012 at 19:15 Comment(3)

This duplicates residuals. Rather than appending NA – Raver 2/12, 2012 at 19:17

I've removed the c() in data.frame – Petuntse 2/12, 2012 at 19:17

What if Y is NA? What if another predictor variables is NA? Not very robust to this, thus probably not a way to go. – Buffer 31/7, 2013 at 18:9

G

0

"[<-"(df, !is.na(df$X), "res", residuals(lm(Y ~ X,data=df,na.action=na.omit)))

will do the trick.

Grizel answered 2/12, 2012 at 19:44 Comment(2)

Can you explain this? What is "[<-"? – Raver 2/12, 2012 at 20:8

@BrandonBertelsen The function "[<-"(x1, x2, x3, x4) is similar to x1[x2, x3] <- x4 but leaves x1 unchanged and returns a new object. – Grizel 3/12, 2012 at 7:2

T

0

N <- 100 
Nrep <- 5 
X <- runif(N, 0, 10) 
Y <- 6 + 2*X + rnorm(N, 0, 1) 
X[ sample(which(Y < 15), Nrep) ] <- NA
df <- data.frame(X,Y)

R.all=as.numeric(rep(NA,length(X)))  # numeric vector with missing values
res=residuals(lm(Y ~ X,data=df,na.action=na.omit))  
i=as.numeric(names(res)) # vector locations of non-missing residuals
R.all[i]=res;R.all     # assign residuals to their correct positions.

Training answered 19/7, 2017 at 18:56 Comment(0)

Recommended topics

Hot tags