What is the difference between LL and LR parsing?
Asked Answered
W

4

275

Can anyone give me a simple example of LL parsing versus LR parsing?

Wendall answered 12/5, 2011 at 9:8 Comment(0)
C
562

At a high level, the difference between LL parsing and LR parsing is that LL parsers begin at the start symbol and try to apply productions to arrive at the target string, whereas LR parsers begin at the target string and try to arrive back at the start symbol.

An LL parse is a left-to-right, leftmost derivation. That is, we consider the input symbols from the left to the right and attempt to construct a leftmost derivation. This is done by beginning at the start symbol and repeatedly expanding out the leftmost nonterminal until we arrive at the target string. An LR parse is a left-to-right, rightmost derivation, meaning that we scan from the left to right and attempt to construct a rightmost derivation. The parser continuously picks a substring of the input and attempts to reverse it back to a nonterminal.

During an LL parse, the parser continuously chooses between two actions:

  1. Predict: Based on the leftmost nonterminal and some number of lookahead tokens, choose which production ought to be applied to get closer to the input string.
  2. Match: Match the leftmost guessed terminal symbol with the leftmost unconsumed symbol of input.

As an example, given this grammar:

  • S → E
  • E → T + E
  • E → T
  • T → int

Then given the string int + int + int, an LL(2) parser (which uses two tokens of lookahead) would parse the string as follows:

Production       Input              Action
---------------------------------------------------------
S                int + int + int    Predict S -> E
E                int + int + int    Predict E -> T + E
T + E            int + int + int    Predict T -> int
int + E          int + int + int    Match int
+ E              + int + int        Match +
E                int + int          Predict E -> T + E
T + E            int + int          Predict T -> int
int + E          int + int          Match int
+ E              + int              Match +
E                int                Predict E -> T
T                int                Predict T -> int
int              int                Match int
                                    Accept

Notice that in each step we look at the leftmost symbol in our production. If it's a terminal, we match it, and if it's a nonterminal, we predict what it's going to be by choosing one of the rules.

In an LR parser, there are two actions:

  1. Shift: Add the next token of input to a buffer for consideration.
  2. Reduce: Reduce a collection of terminals and nonterminals in this buffer back to some nonterminal by reversing a production.

As an example, an LR(1) parser (with one token of lookahead) might parse that same string as follows:

Workspace        Input              Action
---------------------------------------------------------
                 int + int + int    Shift
int              + int + int        Reduce T -> int
T                + int + int        Shift
T +              int + int          Shift
T + int          + int              Reduce T -> int
T + T            + int              Shift
T + T +          int                Shift
T + T + int                         Reduce T -> int
T + T + T                           Reduce E -> T
T + T + E                           Reduce E -> T + E
T + E                               Reduce E -> T + E
E                                   Reduce S -> E
S                                   Accept

The two parsing algorithms you mentioned (LL and LR) are known to have different characteristics. LL parsers tend to be easier to write by hand, but they are less powerful than LR parsers and accept a much smaller set of grammars than LR parsers do. LR parsers come in many flavors (LR(0), SLR(1), LALR(1), LR(1), IELR(1), GLR(0), etc.) and are far more powerful. They also tend to have much more complex and are almost always generated by tools like yacc or bison. LL parsers also come in many flavors (including LL(*), which is used by the ANTLR tool), though in practice LL(1) is the most-widely used.

As a shameless plug, if you'd like to learn more about LL and LR parsing, I just finished teaching a compilers course and have some handouts and lecture slides on parsing on the course website. I'd be glad to elaborate on any of them if you think it would be useful.

Camaraderie answered 26/7, 2011 at 2:21 Comment(12)
Your lecture slides are phenomenal, easily the most fun explanation that I have seen :) This is the kind of thing that actually sparks interests.Beggarly
I have to comment on the slides as well! Going through all of them now. Helps a lot! Thanks!Erectile
Really enjoying the slides too. I don't suppose you could post the non-Windows version of the project files (and the scanner.l file, for pp2)? :)Ruthanneruthe
The one thing I can contribute to Matt's excellent summary answer is that any grammar which can be parsed by an LL(k) parser (ie, looking ahead "k" terminals to decide upon the next parse action) can be parsed by an LR(1) parser. This gives one a hint at the incredible power of LR parsing over LL parsing. Source: compiler course at UCSC taught by Dr. F. DeRemer, creator of LALR() parsers.Justinn
@templatetypedef: Great answer, thanks for contributing it. I was looking at your lecture slides; on lecture 03 on Top Down Parsing Part I, on the slide titled Leftmost Derivations, one bullet point says "Increases likelihood that we get a prefix of nonterminals". Are you sure you don't mean "Increases likelihood that we get a prefix of terminals"? "Terminals" seems correct to me, and two slides prior you discuss trying to get a prefix of terminals.Perseid
Can I understand the difference between them like that? LL parser hypothesizes the structure it wants, and test the structure of input string against the hypothesized structure, if it is matched, make a left derivation. While LR parser will try to find the appropriate structure of the input string using the rules.Broadway
That's an excellent answer, I'm intrigued to read your slides next! :)Byars
@Camaraderie Slight technical correction. LR is right most derivation in reverse. Note that the derivation of S->E is first in the LL example, but last in the LR example.Isola
This is graceful. Your resources are seemingly better than any other texts available out there (no need to buy a new one). I'll definitely look into yours soon enough. Thanks a million.Spherics
Excellent resource! Thanks for providing slides, handouts, projects as well.Haff
Thank you very much for the example! I had been stuck on this for a couple of days, but it made much more sense after reading it!! The lecture slides are also great! Thanks so much!!Pycno
I understand you put effort into your answer and it's upvoted, but seriously just step back and think about what you're saying -- the words and syntax you use mean nothing to anyone who doesn't already understand the topic --"At a high level, the difference between LL parsing and LR parsing is that LL parsers begin at the start symbol and try to apply productions to arrive at the target string, whereas LR parsers begin at the target string and try to arrive back at the start symbol. -- unclearFootball
D
64

Josh Haberman in his article LL and LR Parsing Demystified claims that LL parsing directly corresponds with the Polish Notation, whereas LR corresponds to Reverse Polish Notation. The difference between PN and RPN is the order of traversing the binary tree of the equation:

binary tree of an equation

+ 1 * 2 3  // Polish (prefix) expression; pre-order traversal.
1 2 3 * +  // Reverse Polish (postfix) expression; post-order traversal.

According to Haberman, this illustrates the main difference between LL and LR parsers:

The primary difference between how LL and LR parsers operate is that an LL parser outputs a pre-order traversal of the parse tree and an LR parser outputs a post-order traversal.

For the in-depth explanation, examples and conclusions check out Haberman's article.

Dolomites answered 14/8, 2013 at 18:37 Comment(0)
S
16

LL parsing is handicapped, when compared to LR. Here is a grammar that is a nightmare for an LL parser generator:

Goal           -> (FunctionDef | FunctionDecl)* <eof>                  

FunctionDef    -> TypeSpec FuncName '(' [Arg/','+] ')' '{' '}'       

FunctionDecl   -> TypeSpec FuncName '(' [Arg/','+] ')' ';'            

TypeSpec       -> int        
               -> char '*' '*'                
               -> long                 
               -> short                   

FuncName       -> IDENTIFIER                

Arg            -> TypeSpec ArgName         

ArgName        -> IDENTIFIER 

A FunctionDef looks exactly like a FunctionDecl until the ';' or '{' is encountered.

An LL parser cannot handle two rules at the same time, so it must chose either FunctionDef or FunctionDecl. But to know which is correct it has to lookahead for a ';' or '{'. At grammar analysis time, the lookahead (k) appears to be infinite. At parsing time it is finite, but could be large.

An LR parser does not have to lookahead, because it can handle two rules at the same time. So LALR(1) parser generators can handle this grammar with ease.

Given the input code:

int main (int na, char** arg); 

int main (int na, char** arg) 
{

}

An LR parser can parse the

int main (int na, char** arg)

without caring what rule is being recognized until it encounters a ';' or a '{'.

An LL parser gets hung up at the 'int' because it needs to know which rule is being recognized. Therefore it must lookahead for a ';' or '{'.

The other nightmare for LL parsers is left recursion in a grammar. Left recursion is a normal thing in grammars, no problem for an LR parser generator, but LL can't handle it.

So you have to write your grammars in an unnatural way with LL.

Salverform answered 8/6, 2019 at 3:15 Comment(1)
This strikes me as an opinionated rant, rather than an answer to the question. While much of what you say is true, the tone is very much derogatory towards LL parsers, rather than simply pointing out what the differences are (even if those differences mean limitations -- sometimes, limitations are a good thing).Sumerian
M
13

The LL uses top-down, while the LR uses bottom-up approach.

If you parse a progamming language:

  • The LL sees a source code, which contains functions, which contains expression.
  • The LR sees expression, which belongs to functions, which results the full source.
Motif answered 28/5, 2018 at 8:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.