Priority in grammar using Lark
Asked Answered
U

2

5

I have a priority problem in my grammar, and I don't have any more idea to fix it.

I'm using Lark

Here is the thing (I have simplified the problem as much as I can):

from lark import Lark

parser = Lark(r"""
    start: set | set_mul

    set_mul: [nb] set
    set: [nb] "foo"
    nb: INT "x"

   %import common.INT
   %import common.WS
   %ignore WS

   """, start='start')

input = "3xfoo"
p = parser.parse(input)
print(p.pretty())

The output is :

  start
  set_mul
    set
      nb    3

But what I want is :

start
  set_mul
     nb 3
     set

I tried to put priority in my rules, but it's not working.

Do you have any idea of what I would need to change to make it work ?

Thanks

Ursa answered 5/4, 2018 at 11:51 Comment(1)
Please edit your code so it is an MCVE - so I can paste it into a file and run it without having to add imports or anything.Resultant
M
7

A simple solution might be to re-write your grammar to remove the ambiguity.

parser = Lark(r"""
    start: set | set_mul

    set_mul: nb | nb set | nb nb_set
    set: "foo"
    nb_set: nb set
    nb: INT "x"

   %import common.INT
   %import common.WS
   %ignore WS

   """, start='start')

This way, each of the following inputs has only one possible interpretation:

input = "3xfoo"
p = parser.parse(input)
print(p.pretty())

input = "3x4xfoo"
p = parser.parse(input)
print(p.pretty())         

Result:

start
  set_mul
    nb  3
    set

start
  set_mul
    nb  3
    nb_set
      nb    4
      set
Myrtice answered 5/4, 2018 at 14:54 Comment(1)
Thanks a lot !! btw I just tried the set_mul can be just nb set | nb nb_set, no need for the nb on its own, and it avoids wrong parsingUrsa
R
4

This is not a full answer, but gets you part way I hope. Your problem is that your grammar is ambiguous and the example you use hits that ambiguity head-on. Lark chooses to disambiguate for you, and you get the result you. see.

Make Lark not disambiguate, like this by adding ambiguity='explicit':

import lark

parser = lark.Lark(r"""
    start: set | set_mul

    set_mul: [nb] set
    set: [nb] "foo"
    nb: INT "x"

   %import common.INT
   %import common.WS
   %ignore WS

   """, start='start',ambiguity='explicit')

input = "3xfoo"
p = parser.parse(input)
print(p.pretty())

and you get this output which includes the one you want:

_ambig
  start
    set
      nb        3
  start
    set_mul
      set
        nb      3
  start
    set_mul
      nb        3
      set

How can you encourage Lark to disambiguate to your preferred out? Good question.

Resultant answered 5/4, 2018 at 13:8 Comment(2)
Yep I know about the option ambiguity='explicit', but that's pretty heavy to interpret the tree. As you said, I'm looking for a way to choose between the two, or to change my grammar to not be ambiguous. ThanksUrsa
The grammar is ambiguous because nb is optional on set_mul and set.Resultant

© 2022 - 2024 — McMap. All rights reserved.