Remove columns with NA's and/or Zeros Only

Asked 24/9, 2019 at 20:20 Answered 25/9, 2019 at 1:54

I have a sample dataset which looks something similar to the one below:

d= data.frame(a = c(1,5,56,4,9), 
              b = c(0,0,NA,0,NA), 
              c = c(98,67,NA,3,7), 
              d = c(0,0,0,0,0), 
              e = c(NA,NA,NA,NA,NA))

which would be:


| a  |  b |  c | d | e  |
|----|:--:|---:|---|----|
| 1  |  0 | 98 | 0 | NA |
| 5  |  0 | 67 | 0 | NA |
| 56 | NA | NA | 0 | NA |
| 4  | 0  | 3  | 0 | NA |
| 9  | NA | 7  | 0 | NA |

I need to remove all such columns which have:

1. NA's and Zeros
2. Only Zeros
3. Only NA's

So based on the above dataset, columns b,d and e should be eliminated. So, I first need to find out which columns have such conditions and then delete them.

I went through this link Remove the columns with the colsums=0 but I'm not clear with the solution. Also, it doesn't provide me the desired output.

The final output would be:

| a  |  c |
|----|:--:|
| 1  | 98 |
| 5  | 67 |
| 56 | NA |
| 4  | 3  |
| 9  | 7  |

Showily answered 24/9, 2019 at 20:20 Comment(2)

@akrun I'm sorry. I made the edit. They are numeric. – Showily 24/9, 2019 at 20:43

Thank you. I tested your data and it seems to be working fine for me to return the expected output – Burk 24/9, 2019 at 20:44

One option would be to create a logical vector with colSums based on the number of NA or 0 elements in each column

d[!colSums(is.na(d)|d ==0) == nrow(d)]
#  a    c
#1  1   98
#2  5   67
#3 56   NA
#4  4    3
#5  9    7

Or another option is to replace all the 0s to NA and then apply is.na

d[colSums(!is.na(replace(d, d == 0, NA))) > 0]

Or more compactly with na_if

d[colSums(!is.na(na_if(d, 0))) > 0]

Burk answered 24/9, 2019 at 20:26 Comment(2)

Well, I have tried this on my real dataset. I don't know why, but it does not remove the columns which have both NA's and zeros. Even in the above sample dataset, it does not delete the column e. The other approach I could think of was to replace all the NA's with zeros and then remove the columns with all zeros, but that too doesn't seem to work. – Showily 24/9, 2019 at 20:32

@Showily I get the expected output based on your example. May be you used the earlier version – Burk 24/9, 2019 at 20:34

In base and assuming that we have different type of columns:

as.data.frame(Filter(function(x) !all(x=="NA" | x == "0"), {lapply(d, as.character)}))

Using dplyr:

library(dplyr)

d %>% 
  mutate_all(as.character) %>% 
  select(which(colSums(abs(.), na.rm = T) != 0))

Output:

#>    a  c
#> 1  1 98
#> 2  5 67
#> 3 56 NA
#> 4  4  3
#> 5  9  7

Elodiaelodie answered 24/9, 2019 at 20:27 Comment(2)

This would be true if all the columns are numeric. However, if some of those columns are of different data types such as character and date, how will the above code be modified? – Showily 24/9, 2019 at 20:46

@Showily please edit your dataset and make it representative of your actual dataset #5963769 – Elodiaelodie 24/9, 2019 at 20:50

We can use apply column-wise and remove columns which has all, NA or 0's.

d[!apply(d == 0 | is.na(d), 2, all)]

#   a  c
#1  1 98
#2  5 67
#3 56 NA
#4  4  3
#5  9  7

Telles answered 25/9, 2019 at 1:54 Comment(0)

Very strange to store NAs and 0 as strings, but there you go...

bad_column <- function(z) {
  all(z %in% c("NA", "0"))
}

d[, !sapply(d, bad_column), drop = FALSE]

Barns answered 24/9, 2019 at 20:28 Comment(1)

The post has been edited. I have changed them to be numeric. – Showily 24/9, 2019 at 20:43

Recommended topics

Hot tags