I'm trying to find (and replace) repeated string in a string.
My string can look like this:
Lorem ipsum dolor sit amet sit amet sit amet sit nostrud exercitation amit sit ullamco laboris nisi ut aliquip ex ea commodo consequat.
This should become:
Lorem ipsum dolor sit amet sit nostrud exercitation amit sit ullamco laboris nisi ut aliquip ex ea commodo consequat.
Note how the amit sit isn't removed since its not repeated.
Or the string can be like this:
Lorem ipsum dolor sit amet () sit amet () sit amet () sit nostrud exercitation ullamco laboris nisi ut aliquip aliquip ex ea commodo consequat.
which should become:
Lorem ipsum dolor sit amet () sit nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
So its not just a-z but can also have other (ascii) chars. I'm verry happy if someone can help me with this.
The next step would be to match (and replace) something like this:
2 questions 3 questions 4 questions 5 questions
which would become:
2 questions
The number in the final output can be any number 2,3,4, it doesn't matter. There will only be different numbers in the final example but the words will be the same.
sit
not removed in the first paragraph? It is still a repeat of the firstsit
. How are we able to determine the word boundaries correctly? – Pourone two one
one is not repeated but it is inone one two
. Does that answer your question? – Schramke()
clearly isn't. And I quote tandu above, "how are we able to determine the word boundaries correctly?" What result would you want from of these examples:foo foo.
,foo foobar
,foo foo-foo
,foofoofoo
,#¤% #¤% #¤%
,#¤%#¤%#¤%
. – Landreth