regex - how to match group of unique characters of certain length
Asked Answered
G

3

5

I'm looking for a regex that will match ad-hoc groups of characters of certain length only if all its characters are unique.

For the given string example:

123132213231312321112122121111222333211221331

123, 132, 213, 231, 312, 321 are matched and 112, 122, 121, 111, 313, 322, 221, 323, 131, etc are not matched.

I tried (?:([0-9])(?!.{3}\1)){3} but it's completely wrong

Good answered 25/10, 2013 at 16:34 Comment(7)
As a hint, this website can help you visualize what your regular expression is matching on.Nun
Can you elaborate little more ?Truckle
Any particular reason for a regex? It won't be pretty...Carrie
I don't think you even need a regex for that, it needs to be a regex ?Gomulka
I have got MBs of nucleotide sequences so I guess it needs to be regex.Good
@Good What about overlapping sequences, for instance if you had 123121, do you want to get 123 or 123, 231, 312 ?Fulfillment
@Good hmmm you want overlapping sequences too ?Fulfillment
R
4

Iterate over the input string, finding a match of this expression each iteration, chopping off up to and including the first character of the previous match, until there is no match:

((\d)((?!\2)\d)((?!\2)(?!\3)\d))

You could do a findAll, but then you won't detect overlapping matches, such as "12321" would have. You'd only find the first: "123"

Of course, this only works for digits. If you want to match word characters also, you could do:

((\w)((?!\2)\w)((?!\2)(?!\3)\w))

If you want a longer length, just follow the pattern when building a regex:

((\w)((?!\2)\w)((?!\2)(?!\3)\w)((?!\2)(?!\3)(?!\4)\w))

So, I'll just hopefully Python-correct code... :

max=<your arbitrary length>
regex = "((\\w)"
for i in range(1, max-1):
    regex += "("
    for j in range(2, i+1):
        regex +="(?!\\"+j+")"
    regex += "\\w)"
regex = ")"

Phew

Radius answered 25/10, 2013 at 16:49 Comment(0)
F
2

It seems that you are using python. regex is not a silver bullet and definitely not the straightforward solution to your problem (especially because the expression change with the length that you want to analyze) Writing a little code would be better and offer better performance.

Here is an example of code in Scala that solve the problem

"123132213231312321112122121111222333211221331".sliding(3).map(_.distinct).filter(_.size == 3).mkString("-")

output:

123-231-132-213-132-231-312-123-321-321-213
Fra answered 25/10, 2013 at 17:6 Comment(0)
L
1

This regex is from 1-10 digits, take your pick.

 ( \d )
 (?! \1 )
 ( \d )
 (?! \1 | \2 )
 ( \d )
 (?! \1 | \2 | \3 )
 ( \d )
 (?! \1 | \2 | \3 | \4 )
 ( \d )
 (?! \1 | \2 | \3 | \4 | \5 )
 ( \d )
 (?! \1 | \2 | \3 | \4 | \5 | \6 )
 ( \d )
 (?! \1 | \2 | \3 | \4 | \5 | \6 | \7 )
 ( \d )
 (?! \1 | \2 | \3 | \4 | \5 | \6 | \7 | \8 )
 ( \d )
 (?! \1 | \2 | \3 | \4 | \5 | \6 | \7 | \8 | \9 )
 \d 
Leede answered 25/10, 2013 at 17:42 Comment(3)
hehe don't forget to add the x modifier (?x). I'm at my vote-limit now, I'll +1 it laterFulfillment
Can you elaborate little more please?Good
@Good - Sure. Each proceding digit must not be one that was previously captured. Since your example states digits, the regex above shows the progression from 1 to 10 digits. 10 being the maximum. Crop the regex to the number of digits you wish to find. What is it you're trying to do?Leede

© 2022 - 2024 — McMap. All rights reserved.