Why doesn't [01-12] range work as expected?
Asked Answered
B

7

128

I'm trying to use the range pattern [01-12] in regex to match two digit mm, but this doesn't work as expected.

Burgeon answered 30/6, 2010 at 10:14 Comment(4)
You're matching characters, not character sequences. Basically, you're matching against 0, 1 to 1, and 2 (ie. 0, 1 and 2). Consider this: [a-z0-9], this matches all the lowercase letter, and all the digits, but only as a single character.Brack
fwiw I created a javascript tool that creates a highly optimized regex from two inputs (min/max) github.com/jonschlinkert/to-regex-rangeCacophonous
0[1-9]|1[0-2] -> 0|1|2 -> []s in a regex denote a character class. If no ranges are specified, it implicitly ors every character.Sumptuary
Do you need to match it with pure regex? If not, you can: 1.) just simply use the \d+ pattern, 2.) convert the matched strings to numbers in your code. and then, 3.) check the number range like if(num >= 0 && num <= 12){ /*do something*/ }. It's so much faster and flexible.Nikolas
M
259

You seem to have misunderstood how character classes definition works in regex.

To match any of the strings 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, or 12, something like this works:

0[1-9]|1[0-2]

References


Explanation

A character class, by itself, attempts to match one and exactly one character from the input string. [01-12] actually defines [012], a character class that matches one character from the input against any of the 3 characters 0, 1, or 2.

The - range definition goes from 1 to 1, which includes just 1. On the other hand, something like [1-9] includes 1, 2, 3, 4, 5, 6, 7, 8, 9.

Beginners often make the mistakes of defining things like [this|that]. This doesn't "work". This character definition defines [this|a], i.e. it matches one character from the input against any of 6 characters in t, h, i, s, | or a. More than likely (this|that) is what is intended.

References


How ranges are defined

So it's obvious now that a pattern like between [24-48] hours doesn't "work". The character class in this case is equivalent to [248].

That is, - in a character class definition doesn't define numeric range in the pattern. Regex engines doesn't really "understand" numbers in the pattern, with the exception of finite repetition syntax (e.g. a{3,5} matches between 3 and 5 a).

Range definition instead uses ASCII/Unicode encoding of the characters to define ranges. The character 0 is encoded in ASCII as decimal 48; 9 is 57. Thus, the character definition [0-9] includes all character whose values are between decimal 48 and 57 in the encoding. Rather sensibly, by design these are the characters 0, 1, ..., 9.

See also


Another example: A to Z

Let's take a look at another common character class definition [a-zA-Z]

In ASCII:

  • A = 65, Z = 90
  • a = 97, z = 122

This means that:

  • [a-zA-Z] and [A-Za-z] are equivalent
  • In most flavors, [a-Z] is likely to be an illegal character range
    • because a (97) is "greater than" than Z (90)
  • [A-z] is legal, but also includes these six characters:
    • [ (91), \ (92), ] (93), ^ (94), _ (95), ` (96)

Related questions

Molech answered 30/6, 2010 at 10:15 Comment(5)
For me, I was looking for months without prefixing with 0 if single digit. And I used this ([1-9]|(1[0-2])) and it works.Intercolumniation
Important to note: If you find this page wanting a solution for your number range that only has single digits before getting to the tens, 0[1-9]|1[0-2] won't work. Changing it to the logical next step [1-9]|1[0-2] doesn't work either for understandable reasons (It matches the 1 only in 10, 11, and 12). Had to use \b(?:[0-9]|1[0-1])\b to prevent that. \b's makes sure regex matches word (or in this case number) boundaries (^ & $ didn't); brackets make the or (|) consider the other side of it; and finally ?: is to not create a submatch with the use of the brackets.Winwaloe
@Molech : "1,2,3,4,5,6,7,8,9,10,17,18".match(/^(([1-9]|1[0-7])\,?)+$/g ) Can you please tell me why is this JS regex matches above 17?Unaccompanied
@Unaccompanied - polygenelubricants could, and so could I, but then we'd be answering a questi… wait… is this a question you are asking in a comment? There are rulez on this site ;) Ask a Question if you have a new question. Comments are only for critiquing and asking for clarification, and for responding to those.Software
@Unaccompanied Oh, I see. You did re-ask it as a question an hour later. That's great! However, it would probably be a good idea to delete your comment here.Software
B
29

A character class in regular expressions, denoted by the [...] syntax, specifies the rules to match a single character in the input. As such, everything you write between the brackets specify how to match a single character.

Your pattern, [01-12] is thus broken down as follows:

  • 0 - match the single digit 0
  • or, 1-1, match a single digit in the range of 1 through 1
  • or, 2, match a single digit 2

So basically all you're matching is 0, 1 or 2.

In order to do the matching you want, matching two digits, ranging from 01-12 as numbers, you need to think about how they will look as text.

You have:

  • 01-09 (ie. first digit is 0, second digit is 1-9)
  • 10-12 (ie. first digit is 1, second digit is 0-2)

You will then have to write a regular expression for that, which can look like this:

  +-- a 0 followed by 1-9
  |
  |      +-- a 1 followed by 0-2
  |      |
<-+--> <-+-->
0[1-9]|1[0-2]
      ^
      |
      +-- vertical bar, this roughly means "OR" in this context

Note that trying to combine them in order to get a shorter expression will fail, by giving false positive matches for invalid input.

For instance, the pattern [0-1][0-9] would basically match the numbers 00-19, which is a bit more than what you want.

I tried finding a definite source for more information about character classes, but for now all I can give you is this Google Query for Regex Character Classes. Hopefully you'll be able to find some more information there to help you.

Brack answered 30/6, 2010 at 10:21 Comment(0)
P
10

This also works:

^([1-9]|[0-1][0-2])$

[1-9] matches single digits between 1 and 9

[0-1][0-2] matches double digits between 10 and 12

There are some good examples here

Puleo answered 30/6, 2010 at 10:27 Comment(2)
To be exact, [0-1][0-2] also matches 00. That said, +1 for the link (which I've used in my answer).Molech
[0-1][0-2] must be carefully interpreted, as it allows strings like 00, 01, and 02, but it doesn't admit 03 up to 09, admitting finally 10, 11 and 12. A right regex for that is [1-9]|1[0-2], or even 0*([1-9]|1[0-2]) (this last allowing any number of leading zeros).Timon
P
2

The []s in a regex denote a character class. If no ranges are specified, it implicitly ors every character within it together. Thus, [abcde] is the same as (a|b|c|d|e), except that it doesn't capture anything; it will match any one of a, b, c, d, or e. All a range indicates is a set of characters; [ac-eg] says "match any one of: a; any character between c and e; or g". Thus, your match says "match any one of: 0; any character between 1 and 1 (i.e., just 1); or 2.

Your goal is evidently to specify a number range: any number between 01 and 12 written with two digits. In this specific case, you can match it with 0[1-9]|1[0-2]: either a 0 followed by any digit between 1 and 9, or a 1 followed by any digit between 0 and 2. In general, you can transform any number range into a valid regex in a similar manner. There may be a better option than regular expressions, however, or an existing function or module which can construct the regex for you. It depends on your language.

Primitive answered 30/6, 2010 at 10:20 Comment(0)
F
1

Use this:

0?[1-9]|1[012]
  • 07: valid
  • 7: valid
  • 0: not match
  • 00 : not match
  • 13 : not match
  • 21 : not match

To test a pattern as 07/2018 use this:

/^(0?[1-9]|1[012])\/([2-9][0-9]{3})$/

(Date range between 01/2000 to 12/9999 )

Frech answered 23/1, 2018 at 7:24 Comment(1)
I've been trying to figure out how to do this but to get the third condition of only a 0 to pass.Revis
E
0

As polygenelubricants says yours would look for 0|1-1|2 rather than what you wish for, due to the fact that character classes (things in []) match characters rather than strings.

Elviaelvie answered 30/6, 2010 at 10:17 Comment(1)
0|1-1|2 - this notation is very misleading. Something like 0|1|2 would be more accurate.Molech
L
0

My solution to keep mm-yyyy is ^0*([1-9]|1[0-2])-(20[2-4][0-9])$

Longshoreman answered 4/3, 2021 at 13:28 Comment(2)
Probably better ^(0?[1-9]|1[0-2)-… (only a single optional leading 0 in the not double digit case)Trumaine
True, single (?) is better than unlimited (*).Longshoreman

© 2022 - 2024 — McMap. All rights reserved.