Regex: 5 digits in increasing order

Asked 3/7, 2010 at 13:44 Answered 3/7, 2010 at 14:27

I need a regex for 5 digits in increasing order, like 12345, 24579, 34680, and so on.

0 comes after 9.

Bicycle answered 3/7, 2010 at 13:44 Comment(3)

@Lèse, wouldn't that break the "increasing order" constraint? – Odyl 3/7, 2010 at 14:30

@paxdiablo: It's the usual "non-decreasing" vs "increasing" confusion. Sometimes people say one when they really mean the other. Great comment, though, and I've incorporated the concern into my answer. – Annual 3/7, 2010 at 14:43

Order doesn't imply uniqueness. Arranging {1, 2, 1, 4, 6} in "increasing order" gives 11246. Usually when people talk about "order" they're merely referring to a binary relation that specifies precedence; as such there are only 2 directions, so "non-decreasing" would be the same as "increasing". – Retroversion 3/7, 2010 at 15:7

You can try (as seen on rubular.com)

^(?=\d{5}$)1?2?3?4?5?6?7?8?9?0?$

Explanation

^ and $ are the beginning and end of string anchors respectively
\d{5} is the digit character class \d repeated exactly {5} times
(?=...) is a positive lookahead
? on each digit makes each optional

How it works

First we use lookahead to assert that anchored at the beginning of the string, we can see \d{5} till the end of the string
Now that we know that we have 5 digits, we simply match the digits in the order we want, but making each digit optional
- The assertion ensures that we have the correct number of digits

regular-expressions.info

Anchors, Character Classes, Finite Repetition, Lookarounds, and Optional

Generalizing the technique

Let's say that we need to match strings that consists of:

between 1-3 vowels [aeiou]
and the vowels must appear in order

Then the pattern is (as seen on rubular.com):

^(?=[aeiou]{1,3}$)a?e?i?o?u?$

Again, the way it works is that:

Anchored at the beginning of the string, we first assert (?=[aeiou]{1,3}$)
- So correct alphabet in the string, and correct length
Then we test for each letter, in order, making each optional, until the end of the string

Allowing repetition

If each digit can repeat, e.g. 11223 is a match, then:

instead of ? (zero-or-one) on each digit,
we use * (zero-or-more repetition)

That is, the pattern is (as seen on rubular.com):

^(?=\d{5}$)1*2*3*4*5*6*7*8*9*0*$

Annual answered 3/7, 2010 at 13:47 Comment(9)

It fails when you do numbers like 2634 - it returns 26 instead of failing. – Zirkle 3/7, 2010 at 13:49

I'm testing with RegexBuddy. He needs 5, not 4 digits. – Intwine 3/7, 2010 at 13:55

@dierre: I will fix on next revision. Others: don't revise for me. I have other materials I'm writing. – Annual 3/7, 2010 at 13:59

No problem, mate. BTW I didn't know you can do something like this in regexp. It's beautiful ;) – Intwine 3/7, 2010 at 14:1

I'd have gone for .{5}$ as the lookahead constraint given that the numeric constraint is checked by the other part of the RE; with at least one RE engine (the Tcl one) it's about 30% faster when matching 24579, and it matches exactly the same language. – Cathiecathleen 3/7, 2010 at 14:21

@Donal: I think there's value in making the assertion more specific with \d instead of . in terms of readability (i.e. intention is more explicit), but I admit I've never really profiled my regex performance. – Annual 3/7, 2010 at 14:27

This is why I hate extensions to regular expressions like this with a passion. It encourages further abuse of an already abused tool. – Hindustani 3/7, 2010 at 14:27

@Alex: Assertion is a wonderful thing. There are many questions/answers on stackoverflow that shows how to use lookahead/lookbehind, positive and negative. – Annual 3/7, 2010 at 14:29

@poly: I was quite surprised at how much faster it was to be honest. I expected a few (1–3) percent, not 30%! (I wonder whether it is different for other RE engines. Not quite enough to do the test myself though. :-)) Mind you, my real reason for preferring my form is that it separates concerns; the lookahead becomes a pure length constraint. REs are complex enough already without trying to do lots of things at once with them. – Cathiecathleen 3/7, 2010 at 14:34

Wrong tool for the job. Just iterate through the characters one by one and check it. How you would do that depends on which language you're using.

Here is how to check using C:

#include <stdio.h>
#define CHR2INT(c) c - '0'

int main(void)
{
    char *str = "12345";
    int i, res = 1;

    for (i = 1; i < 5; ++i) {
        res &= CHR2INT(str[i - 1]) < CHR2INT(str[i]) && str[i] >= '0' && str[i] <= '9';
    }

    printf("%d", res);

    return 0;
}

It is obviously longer than a regex solution, but a regex solution will never be as fast as that.

Stoecker answered 3/7, 2010 at 13:46 Comment(0)

polygenelubricants's suggestion is a great one, but there's a better one and that's to use a simpler lookahead constraint given that the bulk of the RE checks for the numeric-ness of the characters anyway. For why, see this log of an interactive Tcl session:

% set RE1 "^(?=\\d{5}$)1?2?3?4?5?6?7?8?9?0?$"
^(?=\d{5}$)1?2?3?4?5?6?7?8?9?0?$
% set RE2 "^(?=.{5}$)1?2?3?4?5?6?7?8?9?0?$"
^(?=.{5}$)1?2?3?4?5?6?7?8?9?0?$
% time {regexp $RE1 24579} 100000
32.80587355 microseconds per iteration
% time {regexp $RE2 24579} 100000
22.598555649999998 microseconds per iteration

As you can see, it's about 30% faster to use the version of the RE with .{5}$ as a lookahead constraint, at least in the Tcl RE engine. (Note that the above log misses some lines where I was stabilizing the compilations of the regular expressions, though I'd anticipate RE2 to be a little faster to compile anyway.) If you're using a different RE engine (e.g., PCRE or Perl) then you should recheck to get your own performance figures.

Cathiecathleen answered 3/7, 2010 at 14:27 Comment(2)

+1; if speed is a concern, one can also test for length outside of regex (likely to be fastest) and/or use the possessive modifier. In fact, since you're already doing this benchmark, you should do the possessive test. I'm curious now. – Annual 3/7, 2010 at 14:56

@poly: This all started as a comment on your (very good) answer, but I switched to doing an answer when I discovered that I needed the space to present the results. – Cathiecathleen 3/7, 2010 at 15:40

This is not something that regular expressions are generally good for. The sort of regex you're going to need to acheive this is likely to be bigger and uglier than simple procedural code to do the same thing.

By all means use a regex to ensure you have five digits in your string but then just use normal coding checks to ensure the order is correct.

You don't bang in nails with a screwdriver (well, not if you're smart), so you shouldn't be trying to use regular expressions for every job either :-)

Odyl answered 3/7, 2010 at 13:47 Comment(0)

Explanation

How it works

regular-expressions.info

Generalizing the technique

Allowing repetition

Recommended topics

Hot tags