Eliminate Left Recursion on this PEG.js grammar

Asked 19/11, 2012 at 15:1 Answered 1/9, 2016 at 11:30

(Note: I've read other questions like this, but I haven't been able to figure this out).

I wrote this grammar:

start = call

ident = [a-z]+
spaces = [ ]+

call = f:ident spaces g:(call / ident) {
    return f + "(" + g + ")";
}

With this input

a b c d

it returns

"a(b(c(d)))"

And I want

"a(b)(c)(d)"

I think this left recursive rule can give me something like that, but PEG.js doesn't support left recursion.

call = f:(call / ident) spaces g:ident {
    return f + "(" + g + ")";
}

How can I eliminate the left recursion in this case?

PS: You can test this on the online PEG.js demo

Wrapping answered 19/11, 2012 at 15:1 Comment(0)

Good question. Start by separating your first ident from everything else, since it gets special treatment (no parentheses). Next, defer to a rule to handle the spaces ident recursion that will collect the values that go inside parentheses. The loop wraps the ident text and appends any new text that is collected recursively.

Here is a short-hand version of the rules (note that tail is a separate rule):

head: ident tail?;        //the "head" ident is separated
tail: spaces ident tail?; //each "tail" ident is looped over

Here is the PEG script:

start = head

ident = [a-z]+
spaces = [ ]+

head = head:ident tail:tail? {
    return head + tail;
}

tail = spaces next:ident tail:tail? {
    return "(" + next + ")" + tail
}

Edit: Here is an alternative that does the work in one rule and is more similar to yours.

start = head

ident = [a-z]+
spaces = [ ]+

head = head:ident tail:(spaces next:ident{return "(" + next + ")" })* {
    return head + tail.join("")
}

The output for a b c d is "a(b)(c)(d)" for both scripts.

Rosol answered 23/11, 2012 at 3:26 Comment(0)

If I understand correctly, your problem isn't left recursion, it's the structure of the parse tree.

You've eliminated the left recursion properly, but unfortunately, the only way to get rid of left recursion is by eliminating left recursion in the raw parse tree. Most of the theory for this stuff is just about matching the right set of strings. You are still matching the same set of strings, so the theory is happy, but you wanted a left recursive parse tree. More on that problem on wikipedia.

AFAIK, You can't get the raw output of the PEG parser to be left-recursive. You can, however, do whatever you want with the output. So parse it as an array and then postprocess to give it that nice left-structure.

Doing it with a simplified (no spaces, no multicharacter identifiers) grammar:

 start = call

 id = [a-z]

 call
     = arr:id+ {
         var acc = arr[0]
         for (i = 1; i < arr.length; i++) {
            acc = [acc, arr[i]]
         }
         return acc;
     }

That parses abcd to [ [ [ 'a', 'b' ], 'c' ], 'd' ]. I just used + instead of recursion, and then ran through the resulting array building the structure we wanted. Wikipedia has some notes on doing left recursion with a PEG.

That's assuming you wanted the data structure. If you just want the parens, replace the action with this:

var acc = arr[0]
for (i = 1; i < arr.length; i++) {
    acc = acc + '(' + arr[i] + ')'
}
return acc;

Which gives a(b)(c)(d).

To put spaces and multicharacter id's back, you can do this:

 start = call

 id = [a-z]+

 _ = [ ]+

 call
      = a:id as:arg* {
          arr = [a].concat(as)
          var acc = arr[0]
          for (i = 1; i < arr.length; i++) {
              acc = acc + '(' + arr[i] + ')'
          }
          return acc;
      }

 arg = _ a:id {return a}

Negatron answered 13/11, 2013 at 7:42 Comment(0)

You can reformulate the call non-terminal and put its repeating part in a separated rule using the + operator, this way:

start = call

ident = i:[a-z]+ { return i.join(''); }
spaces = [ ]+

call = f:ident g:args+ {
    return f + g.join('');
}

args = spaces a:ident { return "(" + a + ")"; }

Wunderlich answered 1/9, 2016 at 11:30 Comment(0)

Recommended topics

Hot tags