Real-world LR(k > 1) grammars?
Asked Answered
D

1

3

Making artificial LR(k) grammars for k > 1 is easy:

Input: A1 B x
Input: A2 B y                   (introduce reduce-reduce conflict for terminal a)
A1   : a
A2   : a
B    : b b b ... b              (terminal b occurs k-1 times)

However, are there any real-world non-LR(1) computer languages that are LR(k > 1)-parsable?
Or are non-LR(1) languages also not LR(k) either?

Discriminate answered 26/11, 2013 at 2:18 Comment(0)
A
7

If a language has an LR(k) grammar, then it has an LR(1) grammar which can be generated mechanically from the LR(k) grammar; furthermore, the original parse tree can be recreated from the LR(1) parse tree. A proof of this theorem appears in Section 6.7 of Parsing Theory Vol. II, Sippu & Soisalon-Soininen; they attribute it to "On the complete covering problem for LR(k) grammars" (1976) by MD Mickunas, in JACM 23:17-30.

So there's no language parseable as LR(k) for k>1 which is not also parseable as LR(1), and consequently there really isn't such as thing as an LR(k) language for values of k other than 0 and 1.

However, there are languages for which an LR(2) grammar is much easier. One classic example is the Posix grammar for yacc, which does not require productions to be terminated with a ;. This is unambiguous, because a production must start with an identifier followed by a colon. Written that way, the grammar is clearly LR(2); by the above theorem, an LR(1) grammar exists but it's nowhere near as clean.

Alexander answered 26/11, 2013 at 4:50 Comment(12)
Interesting, I knew you can generate the LR(1) grammar mechanically, but I didn't know you also recover the parse tree mechanically! +1Discriminate
Do you know if there is a simpler description of how to recover the full parse tree somewhere (aside from the paper/book you linked to)? Transforming the grammar is probably doable, but I can't imagine how you can recover the parse tree without effectively making a general LR(k) parser in the process...Discriminate
@mehrdad: I have a copy of Sippu&Soisalen-Soininen in front of me, but that doesn't help you much :( . It does, however, mean that I've never bothered to find another source for the theorem. Do you have an academic library nearby, perhaps?Alexander
Ahh haha I see. Yes, good idea, I'll go check it sometime :) thanks a bunch!Discriminate
Here's a quick quote: "The idea... is to "shift" the derivation trees in G k symbols to the right; in the parser, this will mean that reduce actions are postponed until 1-symbol lookahead is sufficient... To achieve this, any nonterminal A of G is replaced by a set of nonterminals of the form (x, A, y) where y is a string in FOLLOWk(A) and x is a string in FIRSTk(A)." I think that you can probably work out the transformation from that...Alexander
Thanks -- just to check, is this the algorithm for converting the LR(k) grammar into an LR(1) grammar, or is it the algorithm for recovering the LR(k) tree from the LR(1) tree?Discriminate
@Mehrdad: It's the algorithm for making the LR(1) grammar. But from the transformed reduction, you can recover the original reduction (for the nonterminal A, k symbols in the past), and you can just replay the original reductions to build the original parse tree.Alexander
Oh, the part that confused me wasn't transforming the grammar -- it was recovering the parse tree. I'm not sure what you mean by "replay the original reductions" here -- if we couldn't perform the reductions during the first pass, what makes it easier to do so in the second pass? (Don't we encounter the same ambiguities as before?)Discriminate
@Mehrdad: Maybe "replay" wasn't the right word. There's no second pass. Every time you do a reduction (in the cover LR(1) grammar), if that reduction corresponds to a reduction in the original grammar, you "do" the original reduction to build the original tree (so you don't actually build the LR(1) tree; you just use the reduction actions.) There are no ambiguities because the reductions are effectively delayed k symbols (and the context is accumulated into the non-terminal itself).Alexander
Oooh interesting... I think that's starting to make sense, thanks :)Discriminate
@Mehrdad. My quote was a bit misleading; you actually need {x, V, y} where V is a terminal or original non-terminal. You end up with productions of the form (y0, A, ym) -> (y0, X1, y1) (y1, X2, y2) ... (ym-1 Xm ym); when you do that reduction, you actually apply the reduction A -> X1 X2...Xm to the parse tree you are building. You also end up with rules of the form (ax, a, xb) -> b where xb is in FOLLOWk(a); those productions don't correspond to any reduction.Alexander
Haha okay, I'll have to read it carefully again sometime later... LR parsing itself took me a while to understand, so I can only imagine extending it will be harder :)Discriminate

© 2022 - 2024 — McMap. All rights reserved.