I'm trying to use the stringi
package to split on a delimiter (potentially the delimiter is repeated) yet keep the delimiter. This is similar to this question I asked moons ago: R split on delimiter (split) keep the delimiter (split) but the delimiter can be repeated. I don't think base strsplit
can handle this type of regex. The stringi
package can but I can't figure out how to format the regex to it splits on the delimiter if there are repeats and also not to leave an empty string at the end of the string.
Base R solutions, stringr, stringi etc. solutions all welcomed.
The later problem occurs because I use greedy *
on the \\s
but the space isn't garunteed so I could only think to leave it in:
MWE
text.var <- c("I want to split here.But also||Why?",
"See! Split at end but no empty.",
"a third string. It has two sentences"
)
library(stringi)
stri_split_regex(text.var, "(?<=([?.!|]{1,10}))\\s*")
# Outcome
## [[1]]
## [1] "I want to split here." "But also|" "|" "Why?"
## [5] ""
##
## [[2]]
## [1] "See!" "Split at end but no empty." ""
##
## [[3]]
## [1] "a third string." "It has two sentences"
# Desired Outcome
## [[1]]
## [1] "I want to split here." "But also||" "Why?"
##
## [[2]]
## [1] "See!" "Split at end but no empty."
##
## [[3]]
## [1] "a third string." "It has two sentences"
*SKIP
and*F
are, and what roles they play in the regex? – Workwoman