Remove columns with NA's and/or Zeros Only
Asked Answered
S

4

5

I have a sample dataset which looks something similar to the one below:

d= data.frame(a = c(1,5,56,4,9), 
              b = c(0,0,NA,0,NA), 
              c = c(98,67,NA,3,7), 
              d = c(0,0,0,0,0), 
              e = c(NA,NA,NA,NA,NA))

which would be:


| a  |  b |  c | d | e  |
|----|:--:|---:|---|----|
| 1  |  0 | 98 | 0 | NA |
| 5  |  0 | 67 | 0 | NA |
| 56 | NA | NA | 0 | NA |
| 4  | 0  | 3  | 0 | NA |
| 9  | NA | 7  | 0 | NA |

I need to remove all such columns which have:

1. NA's and Zeros
2. Only Zeros
3. Only NA's

So based on the above dataset, columns b,d and e should be eliminated. So, I first need to find out which columns have such conditions and then delete them.

I went through this link Remove the columns with the colsums=0 but I'm not clear with the solution. Also, it doesn't provide me the desired output.

The final output would be:

| a  |  c |
|----|:--:|
| 1  | 98 |
| 5  | 67 |
| 56 | NA |
| 4  | 3  |
| 9  | 7  |
Showily answered 24/9, 2019 at 20:20 Comment(2)
@akrun I'm sorry. I made the edit. They are numeric.Showily
Thank you. I tested your data and it seems to be working fine for me to return the expected outputBurk
B
4

One option would be to create a logical vector with colSums based on the number of NA or 0 elements in each column

d[!colSums(is.na(d)|d ==0) == nrow(d)]
#  a    c
#1  1   98
#2  5   67
#3 56   NA
#4  4    3
#5  9    7

Or another option is to replace all the 0s to NA and then apply is.na

d[colSums(!is.na(replace(d, d == 0, NA))) > 0]

Or more compactly with na_if

d[colSums(!is.na(na_if(d, 0))) > 0]
Burk answered 24/9, 2019 at 20:26 Comment(2)
Well, I have tried this on my real dataset. I don't know why, but it does not remove the columns which have both NA's and zeros. Even in the above sample dataset, it does not delete the column e. The other approach I could think of was to replace all the NA's with zeros and then remove the columns with all zeros, but that too doesn't seem to work.Showily
@Showily I get the expected output based on your example. May be you used the earlier versionBurk
E
2

In base and assuming that we have different type of columns:

as.data.frame(Filter(function(x) !all(x=="NA" | x == "0"), {lapply(d, as.character)}))

Using dplyr:

library(dplyr)

d %>% 
  mutate_all(as.character) %>% 
  select(which(colSums(abs(.), na.rm = T) != 0))

Output:

#>    a  c
#> 1  1 98
#> 2  5 67
#> 3 56 NA
#> 4  4  3
#> 5  9  7
Elodiaelodie answered 24/9, 2019 at 20:27 Comment(2)
This would be true if all the columns are numeric. However, if some of those columns are of different data types such as character and date, how will the above code be modified?Showily
@Showily please edit your dataset and make it representative of your actual dataset #5963769Elodiaelodie
T
2

We can use apply column-wise and remove columns which has all, NA or 0's.

d[!apply(d == 0 | is.na(d), 2, all)]

#   a  c
#1  1 98
#2  5 67
#3 56 NA
#4  4  3
#5  9  7
Telles answered 25/9, 2019 at 1:54 Comment(0)
B
1

Very strange to store NAs and 0 as strings, but there you go...

bad_column <- function(z) {
  all(z %in% c("NA", "0"))
}

d[, !sapply(d, bad_column), drop = FALSE]
Barns answered 24/9, 2019 at 20:28 Comment(1)
The post has been edited. I have changed them to be numeric.Showily

© 2022 - 2024 — McMap. All rights reserved.