ANTLRv4: non-greedy rules
Asked Answered
U

1

9

I'm reading the definite ANTLR4 reference and have a question regarding one of the examples (p. 76):

STRING: '"' (ESC|.)*? '"';
fragment 
ESC: '\\"' | '\\\\' ;

The rule matches a typical C++ string - a char sequence included in "", which can contain \" too.

In my expectation, the rule STRING should match the smallest string possible because of the non-greedy construct. So if it sees a \" it would map \ to . and " to " at the end of the rule, since this would result in the smallest string possible. Instead of this, a \" is mapped to ESC. I have an understanding problem, since it is not what I expected.

What exactly happens here? Is it like this, that a separated DFA matches (ESC|.) first, and another DFA matches STRING using the already matched string of the (ESC|.) construct? I have to admit I haven't read the book to the end.

Unrest answered 13/9, 2013 at 13:14 Comment(0)
A
13

ANTLR 4 lexers normally operate with longest-match-wins behavior, without any regard for the order in which alternatives appear in the grammar. If two lexer rules match the same longest input sequence, only then is the relative order of those rules compared to determine how the token type is assigned.

The behavior within a rule changes as soon as the lexer reaches a non-greedy optional or closure. From that moment forward to the end of the rule, all alternatives within that rule will be treated as ordered, and the path with the lowest alternative wins. This seemingly strange behavior is actually responsible for the non-greedy handling due to the way we order alternatives in the underlying ATN representation. When the lexer is in this mode and reaches the block (ESC|.), the ordering constraint requires it use ESC if possible.

Attlee answered 14/9, 2013 at 2:58 Comment(1)
So I take it that if the rule was STRING: '"' (.|ESC)*? '"'; (with the . alternative before the ESC alternative), then what @Unrest said would indeed happen? > So if it sees a \" it would map \ to . and " to " at the end of the ruleEmplane

© 2022 - 2024 — McMap. All rights reserved.