I'm reading the definite ANTLR4 reference and have a question regarding one of the examples (p. 76):
STRING: '"' (ESC|.)*? '"';
fragment
ESC: '\\"' | '\\\\' ;
The rule matches a typical C++ string - a char sequence included in ""
, which can contain \"
too.
In my expectation, the rule STRING
should match the smallest string possible because of the non-greedy construct. So if it sees a \"
it would map \
to .
and "
to "
at the end of the rule, since this would result in the smallest string possible. Instead of this, a \"
is mapped to ESC
. I have an understanding problem, since it is not what I expected.
What exactly happens here? Is it like this, that a separated DFA matches (ESC|.)
first, and another DFA matches STRING
using the already matched string of the (ESC|.)
construct? I have to admit I haven't read the book to the end.
STRING: '"' (.|ESC)*? '"';
(with the.
alternative before theESC
alternative), then what @Unrest said would indeed happen? > So if it sees a\"
it would map\
to.
and"
to"
at the end of the rule – Emplane