Data
I'm working with a data set resembling the data.frame
generated below:
set.seed(1)
dta <- data.frame(observation = 1:20,
valueA = runif(n = 20),
valueB = runif(n = 20),
valueC = runif(n = 20),
valueD = runif(n = 20))
dta[2:5,3] <- NA
dta[2:10,4] <- NA
dta[7:20,5] <- NA
The columns have NA
values with the last column having more than 60% of observations NAs
.
> sapply(dta, function(x) {table(is.na(x))})
$observation
FALSE
20
$valueA
FALSE
20
$valueB
FALSE TRUE
16 4
$valueC
FALSE TRUE
11 9
$valueD
FALSE TRUE
6 14
Problem
I would like to be able to remove this column in dplyr
pipe line somehow passing it to the select
argument.
Attempts
This can be easily done in base
. For example to select columns with less than 50% NAs
I can do:
dta[, colSums(is.na(dta)) < nrow(dta) / 2]
which produces:
> head(dta[, colSums(is.na(dta)) < nrow(dta) / 2], 2)
observation valueA valueB valueC
1 1 0.2655087 0.9347052 0.8209463
2 2 0.3721239 NA NA
Task
I'm interested in achieving the same flexibility in dplyr
pipe line:
Vectorize(require)(package = c("dplyr", # Data manipulation
"magrittr"), # Reverse pipe
char = TRUE)
dta %<>%
# Some transformations I'm doing on the data
mutate_each(funs(as.numeric)) %>%
# I want my select to take place here
Filter
i.e.Filter(function(x) sum(is.na(x)) < length(x)/2, dta)
– Bezonianfilter
supposed to be dropping the observations? I'm interested in removing columns not rows. – ChowderFilter
with capitalF
– Bezonian?Filter != ?filter
:) – ChowderFilter
solution, I see that you are passing thedta
object, on my real data I'm applying some transformations to the data (likegather
andspread
) so in effect the object I'm working on does not correspond to the initialdta
frame. This is why I added thismutate_each(funs(as.numeric)) %>%
in my example to indicate that I'm working on a transformeddta
. In effect, I don't really havedta
to pass on, just a transformeddata.frame
after applying a couple of pipes. – Chowdersummarise_each
. Perhaps it helps you. – Bezonian