Replace multiple letters with accents with gsub
Asked Answered
L

11

73

of course I could replace specific arguments like this:

    mydata=c("á","é","ó")
    mydata=gsub("á","a",mydata)
    mydata=gsub("é","e",mydata)
    mydata=gsub("ó","o",mydata)
    mydata

but surely there is a easier way to do this all in onle line, right? I dont find the gsub help to be very comprehensive on this.

Labiodental answered 6/3, 2013 at 17:23 Comment(6)
If you wanted to replace different patterns with the same thing, it should be possible with lapply, but as you want to replace different patterns with different strings, I think you will still have to specified these one way or another...Wriggler
You might be able to use chartr to do this.Odontograph
The gsubfn function in the gsubfn package is a generalization of gsub that can do that in one call: gsubfn(".", list("á"="a", "é"="e", "ó"="o"), c("á","é","ó"))Emend
@G.Grothendieck. Thats great and also working for all type of characters. Very valuable comment. Thank you!Labiodental
For people searching for a more general solution to this question, here is a more helpful answer: https://mcmap.net/q/136101/-grep-using-a-character-vector-with-multiple-patternsFeudatory
@G.Grothendieck would you also post this as an answer so that future visitors see it as such?Baily
A
84

Use the character translation function

chartr("áéó", "aeo", mydata)
Ayotte answered 6/3, 2013 at 17:41 Comment(2)
Thats cool for characters... But does this also work with special characaters e.g. underscores, points, etc... It's not within the question, still would be interesting to know something for this case too...Labiodental
@Joschi, your question doesn't talk about it. I think you'll have to escape them because they are special characters...Davina
M
33

An interesting question! I think the simplest option is to devise a special function, something like a "multi" gsub():

mgsub <- function(pattern, replacement, x, ...) {
  if (length(pattern)!=length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result <- x
  for (i in 1:length(pattern)) {
    result <- gsub(pattern[i], replacement[i], result, ...)
  }
  result
}

Which gives me:

> mydata <- c("á","é","ó")
> mgsub(c("á","é","ó"), c("a","e","o"), mydata)
[1] "a" "e" "o"
Melnick answered 6/3, 2013 at 17:40 Comment(0)
S
29

Maybe this can be usefull:

iconv('áéóÁÉÓçã', to="ASCII//TRANSLIT")
[1] "aeoAEOca"
Saddlebag answered 6/3, 2013 at 19:49 Comment(2)
On the most current version of R that I'm using the call iconv('áéóÁÉÓçã', to="ASCII//TRANSLIT") returns "'a'e'o'A'E'Oc~a". Did the behavior change across R versions, or does this have to do with my default encoding?Incarnadine
@Aaron: Don't know if is an encoding problem. I tried here at R 3.3.1 and worked as expected.Saddlebag
T
20

You can use stringi package to replace these characters.

> stri_trans_general(c("á","é","ó"), "latin-ascii")

[1] "a" "e" "o"
Thompkins answered 3/7, 2016 at 15:51 Comment(0)
S
11

This is very similar to @kith, but in function form, and with the most common diacritcs cases:

removeDiscritics <- function(string) {
  chartr(
     "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"
    ,"SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"
    , string
  )
}


removeDiscritics("test áéíóú")

"test aeiou"

Slue answered 30/10, 2016 at 16:16 Comment(0)
V
7

Another mgsub implementation using Reduce

mystring = 'This is good'
myrepl = list(c('o', 'a'), c('i', 'n'))

mgsub2 <- function(myrepl, mystring){
  gsub2 <- function(l, x){
   do.call('gsub', list(x = x, pattern = l[1], replacement = l[2]))
  }
  Reduce(gsub2, myrepl, init = mystring, right = T) 
}
Verbena answered 6/3, 2013 at 17:51 Comment(0)
K
7

A problem with some of the implementations above (e.g., Theodore Lytras's) is that if the patterns are multiple characters, they may conflict in the case that one pattern is a substring of another. A way to solve this is to create a copy of the object and perform the pattern replacement in that copy. This is implemented in my package bayesbio, available on CRAN.

mgsub <- function(pattern, replacement, x, ...) {
  n = length(pattern)
  if (n != length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result = x
  for (i in 1:n) {
    result[grep(pattern[i], x, ...)] = replacement[i]
  }
  return(result)
}

Here is a test case:

  asdf = c(4, 0, 1, 1, 3, 0, 2, 0, 1, 1)

  res = mgsub(c("0", "1", "2"), c("10", "11", "12"), asdf)
Kallick answered 26/5, 2016 at 14:3 Comment(0)
Y
3

Not so elegant, but it works and does what you want

> diag(sapply(1:length(mydata), function(i, x, y) {
+   gsub(x[i],y[i], x=x)
+ }, x=mydata, y=c('a', 'b', 'c')))
[1] "a" "b" "c"
Yuan answered 6/3, 2013 at 17:44 Comment(0)
G
3

Related to Justin's answer:

> m <- c("á"="a", "é"="e", "ó"="o")
> m[mydata]
  á   é   ó 
"a" "e" "o" 

And you can get rid of the names with names(*) <- NULL if you want.

Georgetta answered 12/3, 2017 at 4:57 Comment(0)
D
1

You can use the match function. Here match(x, y) returns the index of y where the element of x is matched. Then you can use the returned indices, to subset another vector (say z) that contains the replacements for the values of x, appropriately matched with y. In your case:

mydata <- c("á","é","ó")
desired <- c('a', 'e', 'o')

desired[match(mydata, mydata)]

In a simpler example, consider the situation below, where I was trying to substitute a for 'alpha', 'b' for 'beta' and so forth.

x <- c('a', 'a', 'b', 'c', 'b', 'c', 'e', 'e', 'd')

y <- c('a', 'b', 'c', 'd', 'e')
z <- c('alpha', 'beta', 'gamma', 'delta', 'epsilon')

z[match(x, y)]
Dink answered 21/2, 2017 at 23:48 Comment(0)
W
0

You can also combine them with gsub:

mydata <- gsub("á","a", gsub("é","e", gsub("í","i", gsub("ó","o", gsub ("ú", "u", mydata)))))

Wehner answered 25/1, 2019 at 14:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.