How to filter out rows where specific columns are all NA with data.table?

Asked 25/5, 2023 at 20:40 Answered 26/5, 2023 at 6:9

I have the same question like solved here, but I have to work with data.table.

What is the best data.table-way to filter out all rows, where specific / "relevant" columns are all NA, unimportant what other "irrelevant" columns show (NA / or not).

library(data.table)

df <- data.frame('epoch' = c(1,2,3),
               'irrel_2' = c(NA,4,5),
               'rel_1' = c(NA, NA, 8),
               'rel_2' = c(3,NA,7)
               )

df
#>   epoch irrel_2 rel_1 rel_2
#> 1     1      NA    NA     3
#> 2     2       4    NA    NA
#> 3     3       5     8     7

setDT(df)
wrong <- na.omit(df, cols = 3:4)

^{Created on 2023-05-25 with reprex v2.0.2}

I want just row 2 to be filtered out. What would be your solution?

Prorogue answered 25/5, 2023 at 20:40 Comment(0)

We can use patterns if the determination of "relevancy" is clear, or you can use a vector of names just as easily. From there, we can use rowSums(!is.na(.)) to find at least one non-NA value.

library(data.table)
setDT(df)
df[df[, rowSums(!is.na(.SD)) > 0, .SDcols = patterns("^rel")],]
#    epoch irrel_2 rel_1 rel_2
#    <num>   <num> <num> <num>
# 1:     1      NA    NA     3
# 2:     3       5     8     7

Houphouetboigny answered 25/5, 2023 at 20:46 Comment(0)

You could solve problem as follow:

df[!is.na(fcoalesce(rel_1, rel_2)),]
# or
df[!is.na(pmin(rel_1, rel_2, na.rm=TRUE)),]

   epoch irrel_2 rel_1 rel_2
   <num>   <num> <num> <num>
1:     1      NA    NA     3
2:     3       5     8     7

Ambages answered 25/5, 2023 at 22:52 Comment(0)

We could do it this way:

df[!is.na(rel_1) | !is.na(rel_2)]

   epoch irrel_2 rel_1 rel_2
1:     1      NA    NA     3
2:     3       5     8     7

Teamster answered 25/5, 2023 at 20:52 Comment(0)

Using dplyr

library(dplyr)
df %>% 
  filter(!if_all(starts_with("rel"), is.na))
  epoch irrel_2 rel_1 rel_2
1     1      NA    NA     3
2     3       5     8     7

Or with data.table

library(data.table)
setDT(df)[df[,  !Reduce(`&`, lapply(.SD, is.na)), 
  .SDcols = patterns("^rel")]]
   epoch irrel_2 rel_1 rel_2
1:     1      NA    NA     3
2:     3       5     8     7

Sharleensharlene answered 26/5, 2023 at 6:9 Comment(0)

Recommended topics

Hot tags