[EDIT: I have left this post for the information on capture groups but the main solution I gave was not correct.
(?:START)((?:[^S]|S[^T]|ST[^A]|STA[^R]|STAR[^T])*)(?:END)
as pointed out in the comments would not work; I was forgetting that the ignored characters could not be dropped and thus you would need something such as ...|STA(?![^R])|
to still allow that character to be part of END, thus failing on something such as STARTSTAEND; so it's clearly a better choice; the following should show the proper way to use the capture groups...]
The answer given using the 'zero-width negative lookahead' operator "?!", with capture groups, is: (?:START)((?!.*START).*)(?:END)
which captures the inner text using $1 for the replace. If you want to have the START and END tags captured you could do (START)((?!.*START).*)(END)
which gives $1=START $2=text and $3=END or various other permutations by adding/removing ()
s or ?:
s.
That way if you are using it to do search and replace, you can do, something like BEGIN$1FINISH. So, if you started with:
abcSTARTdefSTARTghiENDjkl
you would get ghi
as capture group 1, and replacing with BEGIN$1FINISH would give you the following:
abcSTARTdefBEGINghiFINISHjkl
which would allow you to change your START/END tokens only when paired properly.
Each (x)
is a group, but I have put (?:x)
for each of the ones except the middle which marks it as a non-capturing group; the only one I left without a ?:
was the middle; however, you could also conceivably capture the BEGIN/END tokens as well if you wanted to move them around or what-have-you.
See the Java regex documentation for full details on Java regexes.
abcSTARTabcENDabcSTARTabcENDabc
? Do you want both matches? – Interposition