Replace set of words with another set of words in R
Asked Answered
G

5

5

It is a simple question. I have a list of country names. However, I wanted to change few names with correct names. So, I have two more vectors; one with names to be changed, and second with correct names. See the example:

#country names (names are repetitive in the list)
cn <- c("I", "A", "B", "C", "A", "C", "D", "P")

change <- c("A", "B")
tochange <- c("X", "Y")

Expected Output

cn <- c("I", "X", "Y", "C", "X", "C", "D", "P")

Thanks

Grained answered 28/7 at 9:38 Comment(0)
G
5

Uisng stringi::stri_replace_all_fixed.

> stringi::stri_replace_all_fixed(cn, change, tochange, vectorize_all=FALSE)
[1] "I" "X" "Y" "C" "X" "C" "D" "P"
Gardening answered 28/7 at 9:43 Comment(4)
I have one question: does it work if given the mapping like "A" --> "B" and "B" --> "A"? Seems it cannot give the desired mappingFifteen
@Fifteen We probably need vectorize_all=TRUE (default) in this case, x <- rep_len(LETTERS[1:2], 10);stringi::stri_replace_all_fixed(x, c('A', 'B'), c('B', 'A')).Gardening
yes, that did work, thanks! What is the benefit of vectorize_all=FALSE in your solution (regardless of the edge case here)? Speed reason or something else?Fifteen
@Fifteen It's the recycling fallacy. We need x <- c("A", "B", "C");stringi::stri_replace_all_fixed(x, c('A', 'B', 'C'), c('B', 'A', 'C'), vectorize_all=TRUE). Without doing C=C it fails. To get the same but using vectorize_all=FALSE we'd need doing x <- c("A", "B", "C"); x <- stringi::stri_replace_all_fixed(x, "A", "TMP", vectorize_all=FALSE);x <- stringi::stri_replace_all_fixed(x, "B", "A", vectorize_all=FALSE);stringi::stri_replace_all_fixed(x, "TMP", "B", vectorize_all=FALSE).Gardening
F
4

You can try replace + match like below

> d <- tochange[match(cn, change)]

> replace(cn, !is.na(d), na.omit(d))
[1] "I" "X" "Y" "C" "X" "C" "D" "P"
Fifteen answered 28/7 at 11:27 Comment(0)
T
3

Here are some alternatives

1) gsubfn gsubfn is a generalization of gsub in which the second argument can not only be a character string but alternately a named list which we use here (or a function or proto object).

library(gsubfn)
gsubfn("^.*$", setNames(as.list(change), tochange), cn)
## [1] "I" "A" "B" "C" "A" "C" "D" "P"

2) Reduce A base R solution is to use Reduce

dict <- setNames(change, tochange)
Reduce(\(x, y) replace(x, names(y), y), init = cn, dict)
## [1] "I" "A" "B" "C" "A" "C" "D" "P"

3) chartr If the names in the strings are single characters, as in the question, then base R's chartr can be used

chartr(paste0(tochange, collapse = ""), paste0(change, collapse = ""), cn)
## [1] "I" "A" "B" "C" "A" "C" "D" "P"

or hard coding the names

chartr("XY", "AB", cn)
## [1] "I" "A" "B" "C" "A" "C" "D" "P"

Circularity

Although it seems unlikely that the problem here would exhibit circularity such as in where A -> B -> A we can test for it if you think it is possible.

library(igraph)

cnt <- cbind(change, tochange) |>
  graph_from_edgelist() |>
  count_components()

if (cnt != length(change)) stop("circularity found")

Note

Inputs used

cn <- c("I", "A", "B", "C", "A", "C", "D", "P")

change <- c("A", "B")
tochange <- c("X", "Y")
Tolley answered 28/7 at 12:44 Comment(0)
M
2

You can use the ifelse function in R

cn <- c("I", "A", "B", "C", "A", "C", "D", "P")
cn <- ifelse(cn == "A", "X", ifelse(cn == "B", "Y", cn))

print(cn)

OR

Alternatively, you can use the dplyr package for a more readable solution

library(dplyr)

cn <- c("I", "A", "B", "C", "A", "C", "D", "P")

cn <- cn %>% recode("A" = "X", "B" = "Y")

print(cn)

OUTPUT:

[1] "I" "X" "Y" "C" "X" "C" "D" "P"
Maw answered 28/7 at 9:47 Comment(2)
I can not do like this. I have more than 30 words that needs to be replaced. ThanksGrained
the ifelse solution won't work if you have replacement mapping like "A" --> "X" and "X" --> "A"Fifteen
D
2

As a basic for loop:

cn.new <- cn

for (i in seq_along(change)) {
    cn.new[cn.new == change[i]] <- tochange[i]
}

cn
# [1] "I" "A" "B" "C" "A" "C" "D" "P"
cn.new
# [1] "I" "X" "Y" "C" "X" "C" "D" "P"
Dystrophy answered 28/7 at 10:14 Comment(4)
it won't work if you have replacement mapping like "A" --> "X" and "X" --> "A"Fifteen
@ThomasIsCoding: Depends on what you mean with "won't work", the results might be desired, but in any case the OP doesn't have mappings like these.Dystrophy
in OP's example your method did work well. My concern is just about the robustness how it deals with some edge cases, as I provided in the comment.Fifteen
I think that circularity is unlikely to be a problem with this question but in my answer we show how to check for it if you are really worried abou tit.Tolley

© 2022 - 2024 — McMap. All rights reserved.