Replace entire expression that contains a specific string
Asked Answered
M

2

6

I have data frame that has a column with large number of file names like:

d <- c("harry11_scott80_norm.avi","harry11_norm.avi","harry11_scott80_lpf.avi", 
       "joel51_lpf.avi","rich82_joel51_lpf.avi")

I want R to replace all expressions with two people names like harry11_scott80_norm.avi with the expression incongruent and all the ones with single person name like harry11_norm.avi with congruent. I could use gsub to do that:

dd <- gsub("harry11_scott80_norm.avi", "incongruent", d) 

but I got a lot of those names, so it would be a very clunky solution. So ideally I want to replace the ENTIRE expression that contains strings like _scott80_ with "incongruent". I thought that gsub can do this, but when I run it:

dd <- gsub("_scott80_", "incongruent", d)

it returns with harry11incongruentnorm.avi, which is obviously because it simply replace the exact string match. I recon there is some way to tell gsub to replace expression entirely that contains selected string, but I can't find it.

There was a question In R, how do I replace a string that contains a certain pattern with another string?, but I am not sure how to use agrep in this context.


EDIT: Side bonus question - based on @GSee answer, is there any function that allows you to pass a list of strings that you want to replace? For example, gsub(c(".*_scott80_.*", ".*_harry11_.*"), "incongruent", d) won't work.

Murvyn answered 7/11, 2012 at 18:12 Comment(0)
R
17

Here's one way

> gsub(".*_scott80_.*", "incongruent", d)
[1] "incongruent"           "harry11_norm.avi"      "incongruent"          
[4] "joel51_lpf.avi"        "rich82_joel51_lpf.avi"

Or with grep

> d[grep("_scott80_", d)] <- "incongruent"
> d
[1] "incongruent"           "harry11_norm.avi"      "incongruent"          
[4] "joel51_lpf.avi"        "rich82_joel51_lpf.avi"

To address your edit, I believe this will do it (using | to mean "or")

gsub(".*(_scott80_|_harry11_).*", "incongruent", d)

Of course, you don't have any strings in d that match "_harry11_"

Roping answered 7/11, 2012 at 18:14 Comment(1)
Wow, that was VERY quick, thanks @GSee. The one which uses gsub fits my purpose better because it allows me to create separate column with new expressions.Murvyn
T
5

If your filenames are all of the same format, that is those with two names i.e harry11_scott80_norm.avi always have two underscores, and those with one name i.e. harry11_norm.avi always have one underscore, you can quickly use something like this to rename your files:

d = gsub(".*_.*_.*", "incongruent", d)
> d
[1] "incongruent"      "harry11_norm.avi" "incongruent"      "joel51_lpf.avi"  
[5] "incongruent"

d =gsub(".*_.*","congruent",d)
> d
[1] "incongruent" "congruent"   "incongruent" "congruent"   "incongruent"
Tendentious answered 7/11, 2012 at 21:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.