I need a regex for 5 digits in increasing order, like 12345
, 24579
, 34680
, and so on.
0
comes after 9
.
I need a regex for 5 digits in increasing order, like 12345
, 24579
, 34680
, and so on.
0
comes after 9
.
You can try (as seen on rubular.com)
^(?=\d{5}$)1?2?3?4?5?6?7?8?9?0?$
^
and $
are the beginning and end of string anchors respectively\d{5}
is the digit character class \d
repeated exactly {5}
times(?=...)
is a positive lookahead?
on each digit makes each optional\d{5}
till the end of the stringLet's say that we need to match strings that consists of:
[aeiou]
Then the pattern is (as seen on rubular.com):
^(?=[aeiou]{1,3}$)a?e?i?o?u?$
Again, the way it works is that:
(?=[aeiou]{1,3}$)
If each digit can repeat, e.g. 11223
is a match, then:
?
(zero-or-one) on each digit,*
(zero-or-more repetition) That is, the pattern is (as seen on rubular.com):
^(?=\d{5}$)1*2*3*4*5*6*7*8*9*0*$
26
instead of failing. –
Zirkle .{5}$
as the lookahead constraint given that the numeric constraint is checked by the other part of the RE; with at least one RE engine (the Tcl one) it's about 30% faster when matching 24579
, and it matches exactly the same language. –
Cathiecathleen \d
instead of .
in terms of readability (i.e. intention is more explicit), but I admit I've never really profiled my regex performance. –
Annual Wrong tool for the job. Just iterate through the characters one by one and check it. How you would do that depends on which language you're using.
Here is how to check using C:
#include <stdio.h>
#define CHR2INT(c) c - '0'
int main(void)
{
char *str = "12345";
int i, res = 1;
for (i = 1; i < 5; ++i) {
res &= CHR2INT(str[i - 1]) < CHR2INT(str[i]) && str[i] >= '0' && str[i] <= '9';
}
printf("%d", res);
return 0;
}
It is obviously longer than a regex solution, but a regex solution will never be as fast as that.
polygenelubricants's suggestion is a great one, but there's a better one and that's to use a simpler lookahead constraint given that the bulk of the RE checks for the numeric-ness of the characters anyway. For why, see this log of an interactive Tcl session:
% set RE1 "^(?=\\d{5}$)1?2?3?4?5?6?7?8?9?0?$"
^(?=\d{5}$)1?2?3?4?5?6?7?8?9?0?$
% set RE2 "^(?=.{5}$)1?2?3?4?5?6?7?8?9?0?$"
^(?=.{5}$)1?2?3?4?5?6?7?8?9?0?$
% time {regexp $RE1 24579} 100000
32.80587355 microseconds per iteration
% time {regexp $RE2 24579} 100000
22.598555649999998 microseconds per iteration
As you can see, it's about 30% faster to use the version of the RE with .{5}$
as a lookahead constraint, at least in the Tcl RE engine. (Note that the above log misses some lines where I was stabilizing the compilations of the regular expressions, though I'd anticipate RE2 to be a little faster to compile anyway.) If you're using a different RE engine (e.g., PCRE or Perl) then you should recheck to get your own performance figures.
This is not something that regular expressions are generally good for. The sort of regex you're going to need to acheive this is likely to be bigger and uglier than simple procedural code to do the same thing.
By all means use a regex to ensure you have five digits in your string but then just use normal coding checks to ensure the order is correct.
You don't bang in nails with a screwdriver (well, not if you're smart), so you shouldn't be trying to use regular expressions for every job either :-)
© 2022 - 2024 — McMap. All rights reserved.