Regular Expression to detect repetition within a string
Asked Answered
C

5

10

Is it possible to detect repeated number patterns with a regular expression?

So for example, if I had the following string "034503450345", would it be possible to match the repeated sequence 0345? I have a feeling this is beyond the scope of regex, but I thought I would ask here anyway to see if I have missed something.

Coenesthesia answered 3/6, 2009 at 9:45 Comment(4)
what language/platform are you using?Pennyroyal
I'm using C#. All I needed was the regex though, so I've implemented RichieHindle's solution, and verified it against my test data already! I've also learned a lot from Peter Boughton's excellently commented regex. Thanks both of you!Coenesthesia
@MarkWithers I am dealing with same issue. Can you please be more concrete and tell me something more about your solution? Thank youOleate
@Oleate Not really, it was 6 years ago, so I no longer have access to the source code. However, all of the up-voted regex patterns in this question work, so it's just a case of looking up how to do regex in the language of your choiceCoenesthesia
A
11

Yes, you can - here's a Python test case

import re
print re.search(r"(\d+).*\1", "8034503450345").group(1)
# Prints 0345

The regular expression says "find some sequence of digits, then any amount of other stuff, then the same sequence again."

On a barely-related note, here's one of my favourite regular expressions - a prime number detector:

import re
for i in range(2, 100):
    if not re.search(r"^(xx+)\1+$", "x"*i):
        print i
Aileenailene answered 3/6, 2009 at 9:49 Comment(5)
Your prime number detector finds 0 and 1 to be prime :-)Photocell
Any idea why the following example is only matching 8 and not 0345? In [18]: foo = re.search(r"(\d+).*\1", "80345824103452420345") In [19]: foo.groups() Out[19]: ('8',)Fusillade
@balpha: Good pont - fixed. 8-)Aileenailene
@Hank Gay: It'll match the first repeating group it finds, which in that case is "8". Use more "\d"s, or braces, to put a minimum length on it.Aileenailene
@RichieHindle: Thanks. I had somehow convinced myself (at first) that it was matching the longest possible, not the first.Fusillade
G
22

This expression will match one or more repeating groups:

(.+)(?=\1+)


Here is the same expression broken down, (using commenting so it can still be used directly as a regex).

(?x)  # enable regex comment mode
(     # start capturing group
.+    # one or more of any character (excludes newlines by default)
)     # end capturing group
(?=   # begin lookahead
\1+   # match one or more of the first capturing group
)     # end lookahead


To match a specific pattern, change the .+ to that pattern, e.g. \d+ for one or more numbers, or \d{4,} to match 4 or more numbers.

To match a specific number of the pattern, change \1+, e.g to \1{4} for four repetitions.

To allow the repetition to not be next to each other, you can add .*? inside the lookahead.

Goldthread answered 3/6, 2009 at 10:0 Comment(2)
Good example, explained very wellCoenesthesia
Great explanation. Excellent extension. Thanks!! +1Sultana
A
11

Yes, you can - here's a Python test case

import re
print re.search(r"(\d+).*\1", "8034503450345").group(1)
# Prints 0345

The regular expression says "find some sequence of digits, then any amount of other stuff, then the same sequence again."

On a barely-related note, here's one of my favourite regular expressions - a prime number detector:

import re
for i in range(2, 100):
    if not re.search(r"^(xx+)\1+$", "x"*i):
        print i
Aileenailene answered 3/6, 2009 at 9:49 Comment(5)
Your prime number detector finds 0 and 1 to be prime :-)Photocell
Any idea why the following example is only matching 8 and not 0345? In [18]: foo = re.search(r"(\d+).*\1", "80345824103452420345") In [19]: foo.groups() Out[19]: ('8',)Fusillade
@balpha: Good pont - fixed. 8-)Aileenailene
@Hank Gay: It'll match the first repeating group it finds, which in that case is "8". Use more "\d"s, or braces, to put a minimum length on it.Aileenailene
@RichieHindle: Thanks. I had somehow convinced myself (at first) that it was matching the longest possible, not the first.Fusillade
M
9

Just to add a note to the (correct) answer from RichieHindle:

Note that while Python's regexp implementation (and many others, such as Perl's) can do this, this is no longer a regular expression in the narrow sense of the word.

Your example is not a regular language, hence cannot be handled by a pure regular expression. See e.g. the excellent Wikipedia article for details.

While this is mostly only of academic interest, there are some practical consequences. Real regular expressions can make much better guarantees for maximum runtimes than in this case. So you could get performance problems at some point.

Not to say that it's not a good solution, but you should realize that you're at the limit of what regular expressions (even in extended form) are capable of, and might want to consider other solutions in case of problems.

Michelson answered 3/6, 2009 at 10:12 Comment(0)
B
2

This is the C# code, that uses the backreference construct to find repeated digits. It will work with 034503450345, 123034503450345, 034503450345345, 232034503450345423. The regex is much easier and clearer to understand.

/// <summary>
/// Assigns repeated digits to repeatedDigits, if the digitSequence matches the pattern
/// </summary>
/// <returns>true if success, false otherwise</returns>
public static bool TryGetRepeatedDigits(string digitSequence, out string repeatedDigits)
{
    repeatedDigits = null;

    string pattern = @"^\d*(?<repeat>\d+)\k<repeat>+\d*$";

    if (Regex.IsMatch(digitSequence, pattern))
    {
        Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
        repeatedDigits = r.Match(digitSequence).Result("${repeat}");
        return true;
    }
    else
        return false;
}
Branham answered 3/6, 2009 at 10:17 Comment(1)
Very nice! I like the use of the named group. Production quality code, commented and ready to be copied. Thanks very much!Coenesthesia
J
-1

Use regex repetition: bar{2,} looks for text with two or more bar: barbar barbarbar ...

Journalist answered 13/1, 2013 at 17:22 Comment(1)
The question is about finding any sequence, not one in particular.Conservatory

© 2022 - 2024 — McMap. All rights reserved.