a <- c("this is a number 9999333333 and i got 12344")
How could i replace the number greater than 5 digits with the extra digits being "X"
Expected Output:
"this is a number 99993XXXXX and i got 12344"
Code i tried:
gsub("(.{5}).*", "X", a)
a <- c("this is a number 9999333333 and i got 12344")
How could i replace the number greater than 5 digits with the extra digits being "X"
Expected Output:
"this is a number 99993XXXXX and i got 12344"
Code i tried:
gsub("(.{5}).*", "X", a)
You can use gsub
with a PCRE regex:
(?:\G(?!^)|(?<!\d)\d{5})\K\d
See the regex demo. Details:
(?:\G(?!^)|(?<!\d)\d{5})
- the end of the previous successful match (\G(?!^)
) or (|
) a location not preceded with a digit ((?<!\d)
) and then any five digits\K
- match reset operator discarding all text matched so far\d
- a digit.See the R demo:
a <- c("this is a number 9999333333 and i got 12344")
gsub("(?:\\G(?!^)|(?<!\\d)\\d{5})\\K\\d", "X", a, perl=TRUE)
## => [1] "this is a number 99993XXXXX and i got 12344"
gsub("(?:\\G(?!^)|(?<!\\d)\\d{2}(?=\\d{4}))\\K\\d", "X", a, perl=TRUE)
. The (?=\d{4})
positive lookahead requires four more digits to appear immediately after the first two. –
Thermobarograph gsubfn
in the gsubfn package is like gsub
except the replacement string can be a function which inputs the capture groups and outputs a replacement to the match. The function can optionally be expressed in a formula notation as we do here.
The regular expression (\d{5})
matches and captures 5 digits and (\d+)
matches and captures the remaining digits. The two capture groups are fed into the function and are pasted back together except each character in the second is replaced with X
. r"{...}"
is the notation for string literals introduced in R 4.0 which eliminates having to use double backslashes to denote a backslash within a string literal.
library(gsubfn)
gsubfn(r"{(\d{5})(\d+)}", ~ paste0(x, gsub(".", "X", y)), a)
## [1] "this is a number 99993XXXXX and i got 12344"
If we replace the first argument with the regular expression r"{(\d{2})(\d{4,})}"
then it will replace all but the first two digits provided there are at least 6 digits.
An alternative way, not using gsub
to replace numbers greater than 5 digits in a text is to split the string with strsplit
, test if there are only digits and combine a substr
and a strrep
:
paste(lapply(strsplit(a, " ")[[1]], function(x) {
if(!grepl("\\D", x)) {
paste0(substr(x, 1, 5), strrep("X", pmax(0, nchar(x)-5)))
} else {x}}), collapse = " ")
#[1] "this is a number 99993XXXXX and i got 12344"
To replace X after first 2 digits for numbers greater than 5 digits:
paste(lapply(strsplit(a, " ")[[1]], function(x) {
if(!grepl("\\D", x) & nchar(x) > 5) {
paste0(substr(x, 1, 2), strrep("X", pmax(0, nchar(x)-2)))
} else {x}}), collapse = " ")
#[1] "this is a number 99XXXXXXXX and i got 12344"
© 2022 - 2024 — McMap. All rights reserved.