replace number greater than 5 digits in a text
Asked Answered
E

3

6
a <- c("this is a number 9999333333 and i got 12344")

How could i replace the number greater than 5 digits with the extra digits being "X"

Expected Output:

"this is a number 99993XXXXX and i got 12344"

Code i tried:

gsub("(.{5}).*", "X", a)
Execrable answered 7/9, 2020 at 12:18 Comment(0)
B
8

You can use gsub with a PCRE regex:

(?:\G(?!^)|(?<!\d)\d{5})\K\d

See the regex demo. Details:

  • (?:\G(?!^)|(?<!\d)\d{5}) - the end of the previous successful match (\G(?!^)) or (|) a location not preceded with a digit ((?<!\d)) and then any five digits
  • \K - match reset operator discarding all text matched so far
  • \d - a digit.

See the R demo:

a <- c("this is a number 9999333333 and i got 12344")
gsub("(?:\\G(?!^)|(?<!\\d)\\d{5})\\K\\d", "X", a, perl=TRUE)
## => [1] "this is a number 99993XXXXX and i got 12344"
Banerjee answered 7/9, 2020 at 12:29 Comment(2)
Thanks, If i want to replace X after first 2 digits for numbers greater than 5 digits , how could i change it eg: "this is a number 99XXXXXXXX and i got 12344"Execrable
@Execrable Then use gsub("(?:\\G(?!^)|(?<!\\d)\\d{2}(?=\\d{4}))\\K\\d", "X", a, perl=TRUE). The (?=\d{4}) positive lookahead requires four more digits to appear immediately after the first two.Thermobarograph
D
2

gsubfn in the gsubfn package is like gsub except the replacement string can be a function which inputs the capture groups and outputs a replacement to the match. The function can optionally be expressed in a formula notation as we do here.

The regular expression (\d{5}) matches and captures 5 digits and (\d+) matches and captures the remaining digits. The two capture groups are fed into the function and are pasted back together except each character in the second is replaced with X. r"{...}" is the notation for string literals introduced in R 4.0 which eliminates having to use double backslashes to denote a backslash within a string literal.

library(gsubfn)

gsubfn(r"{(\d{5})(\d+)}", ~ paste0(x, gsub(".", "X", y)), a)
## [1] "this is a number 99993XXXXX and i got 12344"

If we replace the first argument with the regular expression r"{(\d{2})(\d{4,})}" then it will replace all but the first two digits provided there are at least 6 digits.

Drill answered 7/9, 2020 at 13:24 Comment(0)
L
2

An alternative way, not using gsub to replace numbers greater than 5 digits in a text is to split the string with strsplit, test if there are only digits and combine a substr and a strrep:

paste(lapply(strsplit(a, " ")[[1]], function(x) {
  if(!grepl("\\D", x)) {
    paste0(substr(x, 1, 5), strrep("X", pmax(0, nchar(x)-5)))
  } else {x}}), collapse = " ")
#[1] "this is a number 99993XXXXX and i got 12344"

To replace X after first 2 digits for numbers greater than 5 digits:

paste(lapply(strsplit(a, " ")[[1]], function(x) {
  if(!grepl("\\D", x) & nchar(x) > 5) {
    paste0(substr(x, 1, 2), strrep("X", pmax(0, nchar(x)-2)))
  } else {x}}), collapse = " ")
#[1] "this is a number 99XXXXXXXX and i got 12344"
Lisette answered 7/9, 2020 at 14:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.