replace range of numbers with single numbers in a character string
Asked Answered
S

3

4

Is there any way to replace range of numbers wih single numbers in a character string? Number can range from n-n, most probably around 1-15, 4-10 ist also possible.

the range could be indicated with a) -

a <- "I would like to buy 1-3 cats"

or with a word b) for example: to, bis, jusqu'à

b <- "I would like to buy 1 jusqu'à 3 cats"

The results should look like

"I would like to buy 1,2,3 cats"

I found this: Replace range of numbers with certain number but could not really use it in R.

Silvia answered 18/3, 2018 at 3:12 Comment(4)
What regexes have you tried? If nothing, I'd look at gregexpr and regmatches.Centum
@Centum This one is tricky, because the replacement uses functions of capture groups, rather than capture groups themselves.Shornick
@Sathish What about to and bis?Shornick
There are a lot of tricky little things that can throw a wrench into whether a given regex solution will work. Writing a good regex involves first describing all the kinds of inputs you want to match, and just as importantly, all the kinds of inputs you DON'T want to match.Bearwood
C
7

gsubfn in the gsubfn package is like gsub but instead of replacing the match with a replacement string it allows the user to specify a function (possibly in formula notation as done here). It then passes the matches to the capture groups in the regular expression, i.e. the matches to the parenthesized parts of the regular expression, as separate arguments and replaces the entire match with the output of the function. Thus we match "(\\d+)(-| to | bis | jusqu'à )(\\d+)" which results in three capture groups so 3 arguments to the function. In the function we use seq with the first and third of these. Note that seq can take character arguments and interpret them as numeric so we did not have to convert the arguments to numeric.

Thus we get this one-liner:

library(gsubfn)
s <- c(a, b) # test input strings

gsubfn("(\\d+)(-| to | bis | jusqu'à )(\\d+)", ~ paste(seq(..1, ..3), collapse = ","), s)

giving:

[1] "I would like to buy 1,2,3 cats" "I would like to buy 1,2,3 cats"
Conduit answered 18/3, 2018 at 4:9 Comment(1)
Note that when gsubfn is applied to a vector that contains NA values, these are converted to character "NA".Sharpwitted
B
2

This is, in fact, a little tricky, unless someone has already written a package that does this (that I'm not aware of).

a <- "I would like to buy 1-3 cats"
pos <- unlist(gregexpr("\\d+\\D+", a))
a_split <- unlist(strsplit(a, ""))
replacement <- paste(seq.int(a_split[pos[1]], a_split[pos[2]]), collapse = ",")
gsub("\\d+\\D+\\d+", replacement, a)
# [1] "I would like to buy 1,2,3 cats"

EDIT: To show that the same solution works for arbitrary non digit characters between two numbers:

b <- "I would like to buy 1 jusqu'à 3 cats"
pos_b <- unlist(gregexpr("\\d+\\D+", b))
b_split <- unlist(strsplit(b, ""))
replacement <- paste(seq.int(b_split[pos_b[1]], b_split[pos_b[2]]), collapse = ",")
gsub("\\d+\\D+\\d+", replacement, b)
# [1] "I would like to buy 1,2,3 cats"

You can add arbitrary requirements for the run of nondigit characters if you'd like. If you need help with that, just share what the limits on the words or symbols that are between the numbers are!

Bearwood answered 18/3, 2018 at 3:41 Comment(0)
C
2

Not the most efficient, but ...

s <- c("I would like to buy 1-3 cats",
       "I would like to buy 1 jusqu'à 3 cats",
       "foo 22-33",
       "quux 11-3 bar")

gre <- gregexpr("([0-9]+(-| to | bis | jusqu'à )[0-9]+)", s)
gre2 <- gregexpr('[0-9]+', regmatches(s, gre))

regmatches(s, gre) <- lapply(regmatches(regmatches(s, gre), gre2),
                             function(a) paste(do.call(seq, as.list(as.integer(a))), collapse = ","))
s
# [1] "I would like to buy 1,2,3 cats"          "I would like to buy 1,2,3 cats"         
# [3] "foo 22,23,24,25,26,27,28,29,30,31,32,33" "quux 11,10,9,8,7,6,5,4,3 bar"           
Centum answered 18/3, 2018 at 3:41 Comment(1)
does not work for the general case of multiple matches, where capture groups would come in handy: e.g., s <- c("I would like to buy 1-3 cats or 4-6 dogs")Sharpwitted

© 2022 - 2024 — McMap. All rights reserved.