How to match a fixed number of characters in ANTLR 3?
Asked Answered
M

2

9

I want to parse ISO 8601 dates in my ANTLR grammar.

2001-05-03

I have the following entries in my grammar file:

date    : FOUR_DIGIT ('-')? TWO_DIGIT ('-')? TWO_DIGIT ;

FOUR_DIGIT
    : TWO_DIGIT TWO_DIGIT ; 

TWO_DIGIT
    : DIGIT DIGIT ;

DIGIT   : ('0'..'9') ;

I know I can match one or more with DIGIT+ and zero or more with DIGIT*

While this works, is there a simpler syntax to specify I want to match exactly 2 DIGIT?

Mirador answered 3/5, 2011 at 5:31 Comment(1)
just for those that drive by in the future, I abandoned parsing dates in my parser and decided it is better to parse the dates directly in code until I can figure out a way to parse dates natively that doesn't end up conflicting with other terms that start with DIGIT patterns.Mirador
B
3

I'm pretty sure ANTLR 3 has no quantifiers besides *, + and ?. DIGIT DIGIT DIGIT DIGIT seems like the most reasonable way to get the behavior you want.

See http://www.antlr.org/wiki/display/ANTLR3/Grammars

Battik answered 3/5, 2011 at 5:57 Comment(0)
S
8

Jarrod Roberson wrote:

While this works, is there a simpler syntax to specify I want to match exactly 2 DIGIT?

No, DIGIT DIGIT is the only way to match exactly two digits. ANTLR does not support something like DIGIT{2}, unfortunately.

Seafaring answered 3/5, 2011 at 5:58 Comment(4)
I removed month, it didn't buy me anything, since it collides with trying to restrict the days as the same way. Also 00 would not be a valid month.Mirador
I see. Yes, that looks better: just match the dates "loosely" and at a later stage check if it's a valid date or not.Seafaring
Perhaps you know, but realize that all 4 (and 2) digit numbers are now tokenized as FOUR_DIGIT (or TWO_DIGIT). So calling DIGIT+ from any of your parser rules can never match either four- or two successive digits.Seafaring
@Jarrod, yeah, I figured that much, but wanted to make sure.Seafaring
B
3

I'm pretty sure ANTLR 3 has no quantifiers besides *, + and ?. DIGIT DIGIT DIGIT DIGIT seems like the most reasonable way to get the behavior you want.

See http://www.antlr.org/wiki/display/ANTLR3/Grammars

Battik answered 3/5, 2011 at 5:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.