I'm implementing a PEG parser generator in Python, and I've had success so far, except with the "cut" feature, of which whomever knows Prolog must know about.
The idea is that after a cut (!
) symbol has been parsed, then no alternative options should be attempted at the same level.
expre = '(' ! list ')' | atom.
Means that after the (
is seen, the parsing must succeed, or fail without trying the second option.
I'm using Python's (very efficient) exception system to force backtracking, so I tried having a special FailedCut
exception that would abort the enclosing choice, but that didn't work.
Any pointers to how this functionality is implemented in other parser generators would be helpful.
Maybe the problem I've had has been lack of locality. The code generated for the left part of the rule would be something like:
cut_seen = False
try:
self.token('(')
cut_seen = True
self.call('list')
self.token(')')
except FailedParse as e:
if cut_seen:
raise FailedCut(e)
raise
Then the code generated for the choice (|
) operator will skip the following choices if it catches a FailedCut
. What I mean by lack of locality is that the choice catching the FailedCut
may be deep up in calls, thus having an effect too-difficult to discern.
Instead of making the code generated for sequences try to inform enclosing choices of cuts, I could make the code generated for choices beware of them. That would make the scope of cuts very local, unlike Prolog's, but good enough for what I want in a PEG parser, which is to commit to an option after a certain token sequence has been seen, so the error reporting is refers to that location in the source, instead of to another location where some other option might have been available.
It just occurred to me that if the code generated for a rule/predicate catches FailedCut
and translates it into a normal FailedParse
exception, then the cuts will have the right scope.
In reference to @false's question, here's a complete example of what I want to work:
start = expre ;
expre = named | term ;
named = word ':' ! term;
term = word ;
In that grammar, word
can be reached through named
or term
, but I would like the parser to commit to the named
branch after it has seen the :
.
The Solution
To be fair, I've published my work so far at https://bitbucket.org/apalala/grako/.
In the final solution, sequences are enclosed with this context manager:
@contextmanager
def _sequence(self):
self._push_cut()
try:
yield
except FailedParse as e:
if self._cut():
self.error(e, FailedCut)
else:
raise
finally:
self._pop_cut()
And options in a choice function are enclosed with this:
@contextmanager
def _option(self):
p = self._pos
try:
self._push_ast()
try:
yield
ast = self.ast
finally:
self._pop_ast()
self.ast.update(ast)
except FailedCut as e:
self._goto(p)
raise e.nested
except FailedParse:
self._goto(p)
Which forces an exit out of the choice instead of a return to try the next option.
The cuts themselves are implemented thus:
def _cut(self):
self._cut_stack[-1] = True
The full source code may be found on Bitbucket.
if-and-and-and-else
s), and faster recovery at the point where recovery can be done. – Drayton