Select groups which have at least one of a certain value
Asked Answered
E

3

42

How to select groups based on a condition on the individual rows, say keep all groups that contain at least one (ANY) of a certain value, e.g. 4, (or any other condition that is TRUE at least once). Or phrased the other way around: if a group does not have any rows where condition is true, the entire group should be removed.

Let's take a very simple data, with two groups, and I want to select the group that has at least one row with a Value of 4, (i.e. group B here)

library(dplyr)
df <- data.frame(Group = LETTERS[c(1,1,1,2,2,2)], Value=c(1:5, 4))

df
#   Group Value
# 1     A     1 # Group A has no values == 4 ~~> remove entire group 
# 2     A     2
# 3     B     3
# 4     B     4 # Group B has at least one 4 ~~> keep the whole group

Doing group_by() and then filter (as in this post) will only select individual rows that contains a value of 4, not the whole group:

df %>%
  group_by(Group) %>%
  filter(Value == 4)
#    Group Value
#   <fctr> <int>
# 1      B     4
Evalynevan answered 27/11, 2016 at 1:53 Comment(1)
In base R, df[with(df, ave(Value == 4, Group, FUN = any)), ]Tabbatha
E
66

This turns out to be pretty easy: you just need to use the any() function in the filter call. Indeed, it appears that:

  • filter(any(...)) evaluates at the group_by() level,

  • filter(...) evaluates at the rowwise() level, even when preceded by group_by().

Hence use:

 df %>%
    group_by(Group) %>%
    filter(any(Value==4)) 

Group Value
 <fctr> <int>
1      B     3
2      B     4

Interestingly, the same appear with mutate, compare:

df %>%
group_by(Group) %>%
mutate(check1=any(Value==4), 
       check2=Value==4) 

   Group Value check1 check2
  <fctr> <int>  <lgl>  <lgl>
1      A     1  FALSE  FALSE
2      A     2  FALSE  FALSE
3      B     3   TRUE  FALSE
4      B     4   TRUE   TRUE
Evalynevan answered 27/11, 2016 at 1:53 Comment(2)
When I run exactly the same code above, for check1, I get all as TRUE. anybody know why?Febrific
That's surprising indeed! Are you doing the group_by ? Which version of dplyr do you have?Evalynevan
M
8

A data.table option is

library(data.table)
setDT(df)[, if(any(Value==4)) .SD, by = Group]
#   Group Value
#1:     B     4
#2:     B     5
#3:     B     4
Manganin answered 27/11, 2016 at 2:30 Comment(0)
E
4

In base R, without performing any grouping operation we can do :

subset(df, Group %in% unique(Group[Value == 4]))

#  Group Value
#4     B     4
#5     B     5
#6     B     4
Evilminded answered 18/11, 2020 at 8:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.