Filter by multiple patterns with filter() and str_detect()
Asked Answered
D

3

16

I would like to filter a dataframe using filter() and str_detect() matching for multiple patterns without multiple str_detect() function calls. In the example below I would like to filter the dataframe df to show only rows containing the letters a f and o.

df <- data.frame(numbers = 1:52, letters = letters)
df %>%
    filter(
        str_detect(.$letters, "a")|
        str_detect(.$letters, "f")| 
        str_detect(.$letters, "o")
    )
#  numbers letters
#1       1       a
#2       6       f
#3      15       o
#4      27       a
#5      32       f
#6      41       o

I have attempted the following

df %>%
    filter(
        str_detect(.$letters, c("a", "f", "o"))
     )
#  numbers letters
#1       1       a
#2      15       o
#3      32       f

and receive the following error

Warning message: In stri_detect_regex(string, pattern, opts_regex = opts(pattern)) : longer object length is not a multiple of shorter object length

Delinquency answered 26/6, 2017 at 11:56 Comment(0)
D
50

The correct syntax to accomplish this with filter() and str_detect() would be

df %>%
  filter(
      str_detect(letters, "a|f|o")
  )
#  numbers letters
#1       1       a
#2       6       f
#3      15       o
#4      27       a
#5      32       f
#6      41       o
Delinquency answered 26/6, 2017 at 11:56 Comment(2)
Corrected, thank you. Always learning. I have been using case_when() and became confusedDelinquency
If this is the correct syntax, it strikes me that str_detect() is intended for more limited situations than I had supposed. What function would be recommended if you were looking for partial matching against a longer list of longer strings? Back to grep?Benfield
I
1

To synthesize the accepted answer even further, one could also define a vector with search patterns of interest and concatenate those with paste using its collapse argument where the search criterion 'or' is defined as '|' and the search criterion 'and' as '&'.

This could be useful, for example, when the search patterns are automatically generated somewhere else in the script or read from a source.

#' Changing the column name of the letters column to `lttrs`
#' to avoid confusion with the built-in vector `letters`
df <- data.frame(numbers = 1:52, lttrs = letters)

search_vec <- c('a','f','o')
df %>% 
    filter(str_detect(lttrs, pattern = paste(search_vec, collapse = '|')))

#  numbers letters
#1       1       a
#2       6       f
#3      15       o
#4      27       a
#5      32       f
#6      41       o
Istle answered 10/11, 2023 at 9:2 Comment(0)
M
0

Is this possible with an "&" rather an "|" (sorry dont have enough rep for comment)

Misfeasance answered 2/11, 2022 at 15:49 Comment(1)
This does not really answer the question. If you have a different question, you can ask it by clicking Ask Question. To get notified when this question gets new answers, you can follow this question. Once you have enough reputation, you can also add a bounty to draw more attention to this question. - From ReviewBozen

© 2022 - 2024 — McMap. All rights reserved.