Delete rows containing specific strings in R
Asked Answered
C

7

83

I would like to exclude lines containing a string "REVERSE", but my lines do not match exactly with the word, just contain it.

My input data frame:

   Value   Name 
    55     REVERSE223   
    22     GENJJS
    33     REVERSE456
    44     GENJKI

My expected output:

   Value   Name 
    22     GENJJS
    44     GENJKI
Crotchet answered 7/3, 2014 at 12:8 Comment(0)
M
130

This should do the trick:

df[- grep("REVERSE", df$Name),]

Or a safer version would be:

df[!grepl("REVERSE", df$Name),]
Md answered 7/3, 2014 at 12:15 Comment(7)
What do you mean by "safer"?Telega
What if I want to delete the rows containing a "(". The following does not seem to work: df[!grepl("(", df$Name),]Deformity
@Deformity The grepl function uses regular expressions for the match, which have a syntax where ( is meaningful. If you set the named parameter fixed = TRUE then grepl will perform a literal match without using regular expressions, which should work for your use case.Chlorella
@JasonMeloHall minus (-) operator does use negative indexing and negation (!) operator uses logical indexing so negation operator is safer than minus(-)Fairleigh
How could you modify this to also delete the row above the row that contains the matching string?Justiceship
Thanks a lot - googled for about 1h before I found your answer. Works brilliant.Drawbridge
Hi guys, I tried the code in my contigency table making, although I save my dataframe as a new data, the filtered rows are still kept as a level with "0" entries. how do I ask it to completely delete the rows in the new dataframe?Pieter
S
32

You could use dplyr::filter() and negate a grepl() match:

library(dplyr)

df %>% 
  filter(!grepl('REVERSE', Name))

Or with dplyr::filter() and negating a stringr::str_detect() match:

library(stringr)

df %>% 
  filter(!str_detect(Name, 'REVERSE'))
Stalactite answered 29/3, 2018 at 15:46 Comment(3)
This question asks for many strings. So what happens if you want to remove multiple strings i.e. remove.list <- c("REVERSE", "FOO", "BAR, "JJ")Occupy
Sure, you can do create the list like this: remove.list <- paste(c("REVERSE", "FOO", "BAR", "JJ"), collapse = '|') And then filter like this: df %>% filter(!grepl(remove.list, Name)) df %>% filter(!str_detect(Name, remove.list))Stalactite
How to drop rows based in strings present in two columns, like filter(!grepl('REVERSE', Name & 'BAR', Name)), I got the following error trying this way. ! operations are possible only for numeric, logical or complex typesParsnip
R
21

Actually I would use:

df[ grep("REVERSE", df$Name, invert = TRUE) , ]

This will avoid deleting all of the records if the desired search word is not contained in any of the rows.

Rosie answered 7/11, 2014 at 22:53 Comment(0)
D
6

You can use stri_detect_fixed function from stringi package

stri_detect_fixed(c("REVERSE223","GENJJS"),"REVERSE")
[1]  TRUE FALSE
Depress answered 13/3, 2014 at 11:45 Comment(0)
S
4

You can use this function if it's multiple string df[!grepl("REVERSE|GENJJS", df$Name),]

Snapp answered 23/7, 2020 at 8:48 Comment(0)
S
3

You can use it in the same datafram (df) using the previously provided code

df[!grepl("REVERSE", df$Name),]

or you might assign a different name to the datafram using this code

df1<-df[!grepl("REVERSE", df$Name),]
Steal answered 4/10, 2019 at 13:6 Comment(0)
A
2

A late answer building on BobD59's and hidden-layer's responses.

This removes multiple specific strings, whilst avoiding deleting all of the records if the desired search word is not contained in any of the rows.

df1 <-
   df[!grepl("REVERSE|GENJJS", df$Name), (invert = TRUE), ]
Aftersensation answered 22/6, 2022 at 9:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.