Implementing "cut" in a recursive descent parser
Asked Answered
D

3

7

I'm implementing a PEG parser generator in Python, and I've had success so far, except with the "cut" feature, of which whomever knows Prolog must know about.

The idea is that after a cut (!) symbol has been parsed, then no alternative options should be attempted at the same level.

expre = '(' ! list ')' | atom.

Means that after the ( is seen, the parsing must succeed, or fail without trying the second option.

I'm using Python's (very efficient) exception system to force backtracking, so I tried having a special FailedCut exception that would abort the enclosing choice, but that didn't work.

Any pointers to how this functionality is implemented in other parser generators would be helpful.

Maybe the problem I've had has been lack of locality. The code generated for the left part of the rule would be something like:

cut_seen = False
try:
    self.token('(')
    cut_seen = True 
    self.call('list')
    self.token(')')
except FailedParse as e:
    if cut_seen:
         raise FailedCut(e)
    raise

Then the code generated for the choice (|) operator will skip the following choices if it catches a FailedCut. What I mean by lack of locality is that the choice catching the FailedCut may be deep up in calls, thus having an effect too-difficult to discern.

Instead of making the code generated for sequences try to inform enclosing choices of cuts, I could make the code generated for choices beware of them. That would make the scope of cuts very local, unlike Prolog's, but good enough for what I want in a PEG parser, which is to commit to an option after a certain token sequence has been seen, so the error reporting is refers to that location in the source, instead of to another location where some other option might have been available.

It just occurred to me that if the code generated for a rule/predicate catches FailedCut and translates it into a normal FailedParse exception, then the cuts will have the right scope.

In reference to @false's question, here's a complete example of what I want to work:

start = expre ;

expre = named | term ;

named = word ':' ! term;

term = word ;

In that grammar, word can be reached through named or term, but I would like the parser to commit to the named branch after it has seen the :.

The Solution

To be fair, I've published my work so far at https://bitbucket.org/apalala/grako/.

In the final solution, sequences are enclosed with this context manager:

@contextmanager
def _sequence(self):
    self._push_cut()
    try:
        yield
    except FailedParse as e:
        if self._cut():
            self.error(e, FailedCut)
        else:
            raise
    finally:
        self._pop_cut()

And options in a choice function are enclosed with this:

@contextmanager
def _option(self):
    p = self._pos
    try:
        self._push_ast()
        try:
            yield
            ast = self.ast
        finally:
            self._pop_ast()
        self.ast.update(ast)
    except FailedCut as e:
        self._goto(p)
        raise e.nested
    except FailedParse:
        self._goto(p)

Which forces an exit out of the choice instead of a return to try the next option.

The cuts themselves are implemented thus:

def _cut(self):
    self._cut_stack[-1] = True

The full source code may be found on Bitbucket.

Drayton answered 3/1, 2013 at 18:15 Comment(2)
I don't know how you used exceptions, thus my advice could be useless.... Take a look at cut implementation of YieldProlog. Do you think is relevant to you?Regurgitate
My parser generator uses exceptions in the straight-forward way: a statement either parses the intended input, or it raises an exception. My idea is the approach allows much more straight-forward code (no if-and-and-and-elses), and faster recovery at the point where recovery can be done.Drayton
D
2

The solution proposed at the end of my question worked:

cut_seen = False
try:
    self.token('(')
    cut_seen = True 
    self.call('list')
    self.token(')')
except FailedParse as e:
    if cut_seen:
         raise FailedCut(e)
    raise

Then, any time a choice or optional is evaluated, the code looks like this:

p = self.pos
try:
   # code for the expression
except FailedCut:
    raise
except FailedParse:
    self.goto(p)

Edit

The actual solution required keeping a "cut stack". The source code is int Bitbucket.

Drayton answered 11/1, 2013 at 21:30 Comment(0)
S
3

In a Prolog with ISO Prolog's exception handling (catch/3 and throw/1), a cut could be implemented as:

cut. % Simply succeeds
cut :-
   throw(cut). % on backtracking throws an exception

This would require to catch that exception at appropriate places. For example, each goal (that is non-terminal) of a user defined predicate could now be wrapped with:

catchcut(Goal) :-
   catch(Goal,cut,fail).

This is not the most efficient way to implement cut since it does not free resources upon success of !, but it might be sufficient for your purposes. Also, this method now might interfere with user-defined uses of catch/3. But you probably do not want to emulate the entire Prolog language in any case.

Also, consider to use Prolog's -grammars directly. There is a lot of fine print that is not evident when implementing this in another language.

Scholasticism answered 3/1, 2013 at 19:8 Comment(5)
Remember that I'm working in Python! What you describe was kind of what I attempted: wrap the sequence following the cut in a pass/throw structure, but it didn't work, probably because the generated code was too complicated. All I need is for a choice to fail once an option that was committed through a cut failed.Drayton
@Apalala: This is a clear misunderstanding: Not the "sequence following the cut" is protected with an exception handler, but the non-terminal where the cut occurs in.Scholasticism
If I understand correctly, you're thinking Prolog, and the cut protects the particular clause from backtracking, because the non-terminal may be the left hand side of several clauses. I think I need to post an example.Drayton
@Apalala: You need to have some backtracking mechanism in your implementation. Note that it is usually next-to trivial implementing backtracking with loops in procedural languages-as long as there is no interruption due to cut. With the method I showed you, you can even handle this. Not very efficiently, but probably good enough for parsing.Scholasticism
I understand, but this is is a PEG parser generator. Once A succeeds in (A|B), there's no trying B, even if there's a failure ahead. What I need is a cut failure of A to avoid trying B.Drayton
D
2

The solution proposed at the end of my question worked:

cut_seen = False
try:
    self.token('(')
    cut_seen = True 
    self.call('list')
    self.token(')')
except FailedParse as e:
    if cut_seen:
         raise FailedCut(e)
    raise

Then, any time a choice or optional is evaluated, the code looks like this:

p = self.pos
try:
   # code for the expression
except FailedCut:
    raise
except FailedParse:
    self.goto(p)

Edit

The actual solution required keeping a "cut stack". The source code is int Bitbucket.

Drayton answered 11/1, 2013 at 21:30 Comment(0)
C
1

Just read it.

I'd suggested a deep cut_seen (like with modifying parser's state) and a save and restore state with local variables. This uses the thread's stack as "cut_seen stack".

But you have another solution, and I'm pretty sure you're fine already.

BTW: nice compiler – it's just the opposite of what I'm doing with pyPEG so I can learn alot ;-)

Card answered 21/1, 2013 at 23:33 Comment(7)
I though about having cut_seen be a class attribute, but then I'd loose the desired local effect: a cut+fail should affect only the closest choice in the runtime stack, and a cut+success should eventually clear the cut. I'm updating the question with the code relevant to the solution.Drayton
Grako generates a model of the parser like pyPEG, and the model generates the top-down parser source-code. The model can also parse, but the nodes themselves do the parsing, which is very efficient, unlike in pyPEG, where an external parser visits the model using isinstance() to decide what to do with each node. See this. The reason Grako generates a functional top-down parser is because model-based parsing engines are too difficult to debug.Drayton
The reason why I'm not generating a parser is that I need to modify the grammar while runtime. That would require to regenerate a parser each time it is modified. The debugging is not too difficult, though, because there is a generic parse function only. There is no need to debug generated functions.Card
I understand, but you would gain plenty in performance if you made the grammar model objects parse their own stuff instead of having the parse() function make decisions based on their types.Drayton
You're completely right. But pyPEG has a special purpose: I need it to bootstrap Intrinsic, a language which is requiring me to have pyPEG as an interpreter and not generating out anything. This makes pyPEG extremely flexible and easy, while Grako will be faster without any doubts. I like your idea of having a cut in PEG grammars, though.Card
Want I meant to say is that pyPEG would be much faster if each node created while constructing the grammar did it's own parsing. It would be moving code out of the parse() method, and into parse() methods of the nodes, which already know they're type. It's the Composite Design Pattern.Drayton
Yes. That would require me to implement monkey patching. You remember, the grammar will change while runtime. Because I wanted to avoid self modifying code, I was setting aside compiling at all.Card

© 2022 - 2024 — McMap. All rights reserved.