R-generate a "missing values variable"
Asked Answered
D

1

8

I am using R to generate examples of how to deal with missing data for the statistics class I am teaching. One method requires generating a "missing values binary variable", with 0 for cases containing missing values, and 1 with no missing values. For example

n  X  Y    Z  
1  4  300  2  
2  8  400  4  
3  10 500  7  
4  18 NA   10  
5  20  50  NA  
6  NA 1000 5  

I would like to generate a variable M, such that

n m  
1 1  
2 1   
3 1  
4 0  
5 0  
6 0  

It seems this should be simple, given R's ability to handle missing values. The closest I have found is m <-ifelse(is.na(missguns),0,1), but all this does is generate a new entire data matrix with 0 or 1 indicating missingness. However, I just want one variable indicating if a row contains missing values.

Darksome answered 26/5, 2013 at 22:45 Comment(0)
G
9

complete.cases does exactly what you want.

complete.cases(x)
## [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

You can coerce to numeric or integer:

as.integer(complete.cases(x))
## [1] 1 1 1 0 0 0
Gerhardt answered 26/5, 2013 at 22:51 Comment(1)
Thanks-that did the trick! Just as an update, I was implementing Rubin's t-test. here is the code I generated. The dataset is "missguns" ("guns" dataset but I have included missing values), and one of the variables is "urban". missing<-as.integer(complete.cases(missguns)) practice<-cbind(missguns,missing) missing<-practice[practice$missing==0,] complete<-practice[practice$missing==1,] t.test(missing$urban,complete$urban)Darksome

© 2022 - 2024 — McMap. All rights reserved.