Note that this question is in the context of Julia, and therefore (to my knowledge) PCRE.
Suppose that you had a string like this:
"sssppaaasspaapppssss"
and you wanted to match, individually, the repeating characters at the end of the string (in the case of our string, the four "s" characters - that is, so that matchall gives ["s","s","s","s"], not ["ssss"]). This is easy:
r"(.)(?=\1*$)"
It's practically trivial (and easily used - replace(r"(.)(?=\1*$)","hell","k")
will give "hekk"
while replace(r"(.)(?=\1*$)","hello","k")
will give "hellk"
). And it can be generalised for repeating patterns by switching out the dot for something more complex:
r"(\S+)(?=( \1)*$)"
which will, for instance, independently match the last three instances of "abc" in "abc abc defg abc h abc abc abc"
.
Which then leads to the question... how would you match the repeating character or pattern at the start of the string, instead? Specifically, using regex in the way it's used above.
The obvious approach would be to reverse the direction of the above regex as r"(?<=^\1*)(.)"
- but PCRE/Julia doesn't allow lookbehinds to have variable length (except where it's fixed-variable, like (?<=ab|cde)
), and thus throws an error. The next thought is to use "\K" as something along the lines of r"^\1*\K(.)"
, but this only manages to match the first character (presumably because it "advances" after matching it, and no longer matches the caret).
For clarity: I'm seeking a regex that will, for instance, result in
replace("abc abc defg abc h abc abc abc",<regex here>,"hello")
producing
"hello hello defg abc h abc abc abc"
As you can see, it's replacing each "abc" from the start with "hello", but only until the first non-match. The reverse one I provide above does this at the other end of the string:
replace("abc abc defg abc h abc abc abc",r"(\S+)(?=( \1)*$)","hello")
produces
"abc abc defg abc h hello hello hello"
^(\S+)(?:\s+\1)*
and then do splitting on space character, – Haydenhaydnr"(.)(?=\1*$)"
(or more generallyr"(\S+)(?=( \1)*$)"
). – Knobkerriesho
in each part of the string"shorts shoes shop shovel shortstuff shoplifter shopshop"
, and not the secondsho
inshopshop
, you need to look in both directions - you're basically figuring out which part isn't the delimiter, in a sense) – Knobkerrie(\S+)((?= \1)|.*)
with"hello$2"
(JavaScript for example), but unfortunately you need to be able to reference the submatches in the replacement string which AFAIK you can't do in Julia :-( – Valerievalerio(\S+ )((?=\1)|.*)
with"hello $2"
since the space could be considered part of the repeat, depending on your specification. – Valerievaleriorepeating characters/words
but(\S+)(?=( \1)*$)
will always match the last "word" egcd
inab bc cd
regardless repetition. Further it seems you also want to work the "word-version" without any separator, but in your sample(\S+)(?=( \1)*$)
there is a space separator. However I found it very interesting :] – Apnea