Reverse only alphabetical patterns in a string in R

Asked 13/4, 2017 at 13:59 Answered 13/4, 2017 at 14:27

I'm trying to learn R and a sample problem is asking to only reverse part of a string that is in alphabetical order:

String: "abctextdefgtext"    
StringNew: "cbatextgfedtext"

Is there a way to identify alphabetical patterns to do this?

Daegal answered 13/4, 2017 at 13:59 Comment(7)

Welcome to SO! What have you tried so far? Please edit your question: stackoverflow.com/posts/43394297/edit – Bilodeau 13/4, 2017 at 14:1

Fyi, R is a language for statistics, where strings are mostly/always static data. – Dorcasdorcea 13/4, 2017 at 14:7

How you identify this "part of the string"? – Ophiology 13/4, 2017 at 14:16

@Ophiology presumably it is a length 2+ "intersection" with abcdefghijklmnopqrstuvwxyz – Dorcasdorcea 13/4, 2017 at 14:18

@RichScriven I'm not quite following. The first four chars abct are in alphabetical order, but just abc get reversed. – Ophiology 13/4, 2017 at 14:20

@Ophiology - Because abc is the only part of abct that is in sequential alphabetical order (if that's a thing - for lack of a better term). t is not the next letter after c – Angioma 13/4, 2017 at 14:21

@RichScriven Yes, but guess you are making an inference which might not be what OP wants. At a first read, I thought that the parts of the string were a given. You are implying that the task is to find them. You are probably right after all but the description is pretty poor, since just the alphabetical order is mentioned, not the sequential part. – Ophiology 13/4, 2017 at 14:26

Here is one approach with base R based on the patterns showed in the example. We split the string to individual characters ('v1'), use match to find the position of characters with that of alphabet position (letters), get the difference of the index and check if it is equal to 1 ('i1'). Using the logical vector, we subset the vector ('v1'), create a grouping variable and reverse (rev) the vector based on grouping variable. Finally, paste the characters together to get the expected output

v1 <- strsplit(str1, "")[[1]]
i1 <- cumsum(c(TRUE, diff(match(v1, letters)) != 1L))
paste(ave(v1, i1, FUN = rev), collapse="")
#[1] "cbatextgfedtext"

Or as @alexislaz mentioned in the comments

 v1 = as.integer(charToRaw(str1))
 rawToChar(as.raw(ave(v1, cumsum(c(TRUE, diff(v1) != 1L)), FUN = rev))) 
 #[1] "cbatextgfedtext"

EDIT:

1) A mistake was corrected based on @alexislaz's comments

2) Updated with another method suggested by @alexislaz in the comments

data

str1 <- "abctextdefgtext"

Profusion answered 13/4, 2017 at 14:9 Comment(2)

Building on the same approach, an alternative could be v1 = as.integer(charToRaw(str1)); rawToChar(as.raw(ave(v1, cumsum(c(TRUE, diff(v1) != 1L)), FUN = rev))). btw, it seems that the "defg" sequence is not recognized correctly in the above approach – Leptospirosis 14/4, 2017 at 12:49

@Leptospirosis Thank you very much for spotting the mistake and showing another great method (learned a lot). I didn't knew it was not matching. – Profusion 14/4, 2017 at 14:49

You could do this in base R

vec <- match(unlist(strsplit(s, "")), letters)
x <- c(0, which(diff(vec) != 1), length(vec))
newvec <- unlist(sapply(seq(length(x) - 1),  function(i) rev(vec[(x[i]+1):x[i+1]])))
paste0(letters[newvec], collapse = "")

#[1] "cbatextgfedtext"

Where s <- "abctextdefgtext"

First you find the positions of each letter in the sequence of letters ([1] 1 2 3 20 5 24 20 4 5 6 7 20 5 24 20)
Having the positions in hand, you look for consecutive numbers and, when found, reverse that sequence. ([1] 3 2 1 20 5 24 20 7 6 5 4 20 5 24 20)
Finally, you get the letters back in the last line.

Luana answered 13/4, 2017 at 14:27 Comment(0)

data

Recommended topics

Hot tags