Regex to find numbers excluding four digit numbers
Asked Answered
B

3

1

I am trying to figure out how to find numbers that are not years (I'm defining a year as simply a number that is four digits wide.)

For example, I want to pick up

1

12

123

But NOT 1234 in order to avoid dates (4 digits).

if the regex also picked up 12345 that is fine, but not necessary for solving this problem

(Note: these requirements may seem odd. They are part of a larger solution that I am stuck with)

Bots answered 17/1, 2012 at 18:14 Comment(2)
What language do you wan to use? Sorry for repetition.Notorious
Sorry - I should have clarified: its a high level system written in vb .net and c#.net. I haven't looked at the code but its probably plain old: System.Text.RegularExpressionsBots
S
5

If lookbehind and lookahead are available, the following should work:

(?<!\d)(\d{1,3}|\d{5,})(?!\d)

Explanation:

(?<!\d)            # Previous character is not a digit
(\d{1,3}|\d{5,})   # Between 1 and 3, or 5 or more digits, place in group 1
(?!\d)             # Next character is not a digit

If you cannot use lookarounds, the following should work:

\b(\d{1,3}|\d{5,})\b

Explanation:

\b                 # Word boundary
(\d{1,3}|\d{5,})   # Between 1 and 3, or 5 or more digits, place in group 1
\b                 # Word boundary

Python example:

>>> regex = re.compile(r'(?<!\d)(\d{1,3}|\d{5,})(?!\d)')
>>> regex.findall('1 22 333 4444 55555 1234 56789')
['1', '22', '333', '55555', '56789']
Shulem answered 17/1, 2012 at 18:18 Comment(5)
Nice! But ... your lookaroundless regex misses 22 and 666666 in the following input: 1 22 333 4444 55555 666666 7777777.Bister
@MikeClark - Yeah, that is kind of tricky to get around and it is why the lookaround is preferable. If you just did (\d{1,3}|\d{5,}) without the boundary checks you would end up with worse results that split the numbers in the middle.Shulem
Switched the second regex to use word boundaries which seems to work better, only difference now is that the second cannot match when letters and numbers are mixed, 'a333' would match for the first regex but not for the second.Shulem
I went with your first one (the look behind).Bots
@jJack - Glad it worked, you can accept it as the best solution by clicking the outline of the check mark next to my answer.Shulem
F
0

Depending on the regex flavor you use, this might work for you:

(([0-9]{1,3})|([0-9]{5,}))
Foghorn answered 17/1, 2012 at 18:18 Comment(0)
S
-1

(\\d{0,4} | \\d{6,}) in java.

Steffi answered 17/1, 2012 at 18:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.