What advantages do LL parsers have over LR parsers?
Asked Answered
G

7

38

What advantages do LL parsers have over LR parsers to warrant their relative popularity in today's parser generator tools?

According to Wikipedia, LR parsing appears to have advantages over LL:

LR parsing can handle a larger range of languages than LL parsing, and is also better at error reporting, i.e. it detects syntactic errors when the input does not conform to the grammar as soon as possible. This is in contrast to an LL(k) (or even worse, an LL(*) parser) which may defer error detection to a different branch of the grammar due to backtracking, often making errors harder to localize across disjunctions with long common prefixes.

Note: This is not homework. I was just surprised when I found out that Antlr is an LL parser generator (despite having "LR" in its name!).

Greybeard answered 3/11, 2010 at 22:36 Comment(6)
Talking about parsers (and not grammars): LL(*) can be written in a simple recursive-decent approach. That's a +1 in my book.Carpenter
@pst: True, I was just hoping that "because they're easier to implement" wasn't the primary advantage. :)Greybeard
Note that the "LR" in ANTLR just stands for "Language Recognition", not anything about the class of grammar it accepts.Wedded
Actually, in Terrence Parr's mind, it stands for Anti-LR. At least I remember an interview with him, where he admitted that this connotation was no accident. He said he created ANTLR specifically because he felt that the near-total focus on LR parsing was a big mistake, and he got fed up telling everybody that LR sucks and nobody listening, so he decided to create the world's best parser generator, crush all the others and leave only LL standing ... or something like that.Summation
@Jörg: Really? Wow, that's certainly ambitious! It sounds like Terrence would certainly have something to add to this question! Perhaps I should email him, asking for his input...Greybeard
@Jörg: I just emailed him, inviting him to participate in this discussion. He's probably busy, but maybe he will still join in!Greybeard
S
37

GLR is great if you want a parse tree/forest and don't mind black boxes. It lets you type in whatever CFG you want at the cost of checking for ambiguities at parse time via exhaustive testing, instead of resolving LR/LALR conflicts statically. Some say that's a good trade-off. Ira Baxter's DMS tool or Elkhound, which has a free C++ grammar, are useful for this class of problem. ANTLR is useful for a large class of language applications too, but uses a top-down approach, generating recursive descent parsers called LL(*) that allow semantic predicates. I will state without proof here that predicates allow you to parse context-sensitive languages beyond CFGs. Programmers like to insert actions into grammars, like good error handling, and like to single-step debug. LL is good at all three. LL is what we do by hand so it's easier to understand. Don't believe the wikipedia nonsense about LR being better at handling errors. That said, if you backtrack a lot with ANTLR, errors are indeed worse with LL(*) (PEGs have this problem).

Re backtracking. GLR speculates (i.e. backtracks) too, just like PEGs, ANTLR, and any other non-deterministic strategy. At any non-deterministic LR state, GLR "forks" sub-parsers to try out any viable path. Anyway, LL has good context for error handling. Where LR knows it's matching an expression, LL knows it's an expression in an assignment or IF-conditional; LR knows it could be in either but isn't sure - and that uncertainty is where it gets its power.

GLR is O(n^3) worst case. packrat/PEG is O(n) worst case. ANTLR's are O(n^2) due to cyclic lookahead DFA but O(n) in practice. Doesn't matter really. GLR is fast enough.

ANTLR is ANother Tool for Lang Recognition not anti-LR, but I like that one too ;)

Frankly, like a lot of young coders in 80s, I didn't understand LALR and didn't like black boxes (now I dig the beauty of the GLR engine but still prefer LL). I built a commercial LL(k) based compiler and decided to build a tool to generate what I had built by hand. ANTLR isn't for everyone and edge cases like C++ might be better handled with GLR but a lot of people find ANTLR fits into their comfort zone. Since Jan 2008, there have been 134,000 downloads of ANTLR's binary jar, within ANTLRWorks, and source zips total (according to Google Analytics). See our paper on LL(*) with lots of empirical data.

Stocktaking answered 4/11, 2010 at 21:0 Comment(10)
I can certainly appreciate the ability to debug line-by-line! I can also appreciate the idea of "context" (that is, knowing that the expression being parsed is contained within an if statement). Thanks for the answer, it certainly illustrates some strengths of LL parsers!Greybeard
@Terence: what's the bit about a "black box"? All parser generators and the support machinery are pretty much mysteries to the users and rightly so; most user don't want to know the theory or the gears. I buy your "absent proof" of ANTLR doing beyond CFGs; but that's true of any grammar-driven engine that also allow semantic predicates. As a practical matter, they all do (at least all the ones I met or built :) including DMS's. Single-step is a matter of tooling, not the parser generator technology. Quality of error recovery has the same property.Cirilo
All: I agree that ANTLR and most parser generators in fact are good for building parsers for "modest" grammars. The distinction starts to occur when the grammars get "tough". If you are in control of the grammar and can change it to make tough cases go away, then any parser generator will do. If you are not, then the strength of the parser generator really matters. In either case, the engineering in support of the tool really helps. In spite of my bias towards GLR (they are O(n) in practice, too!), I'll be the first to admit that Terence has done a pretty decent job making ANTLR effective.Cirilo
@Ira: I completely agree with your points. I just want to say that the reason why I accepted Terence's answer was because I thought it best answered the original question ("What advantages do LL parsers have over LR parsers?"). I also agree with you (and Terence) that GLR looks fantastic! I also agree that he has done an excellent job with ANTLR. :)Greybeard
@Ira: I just meant that GLR is a black box because it works but is opaque. The other extreme is a recursive descent parser (as generated by ANTLR) that is not a black box because it is what a programmer would build by hand and can single step debug. Actually, that is not totally true. LL(*) allows cyclic DFA to scan ahead and I generate state machines (black boxes) for those decisions but that is just for the alternative production prediction not the parsing.Stocktaking
re: error handling. imagine an inconsistent LR state where you must speculate down multiple possible paths that all lead to invalid parses. You have an error but you don't know from path which to derive the error message. You also have ambiguous context. Are you in an expression for an array index, say, or an expression in an assignment statement? Hard to know how to recover. I'm not sure I see how error handling can be just a matter of the tool. LR-based tools seem a bit handicapped compared to LL-based tools in this regard.Stocktaking
The usual trick is to try to build a tree from the tokens following the point of error, and then look for one token insertion/deletion that will allow the parser to bridge the gap. (We just use the next token because we haven't been highly active in pushing the error recovery). This works pretty well even for an inconsistent state. Most states aren't inconsistent so there's only one sensible continuation. Why doesn't an LL parser have the same problem, if there are multiple possible next parses? The local langugae ambiguity doesn't go away just because you switched parsing technologies.Cirilo
Well, LL does single token insertion and deletion pretty easily; I'm sure the concept is similar in LR, at least, mid production. The primary issue is when you need to recover based upon which rule invoked the current rule (e.g., what's in the FOLLOW set?). In LL, you know precisely which one that is (at the cost of a weaker parsing strategy of course). LR could have multiple contexts. You don't know whether to resynchronize by consuming until ] or ; for example when trying to clean up a misformed expression. If you are called from the array index you want to consume until ] and so on.Stocktaking
With GLR its relatively easy. For all live parses at the point where you hit a syntax error, you propose the insertion of any token whose shift on any parse leads to a state in which the following token is legal, and you propose deleting the current token. It proposes 10-20 cures, most of which die in the usual GLR way. Our parser seems to recovers from missing/extra tokens pretty perfectly. When its lost, it is horribly lost, but it usually recovers after awhile. I used to build metacompilers (synthesize recursive descent parsers). Pretty much when they got lost, they never recovered.Cirilo
An update. Upcoming OOPSLA '14 paper on Adaptive LL(), ALL(), that takes any grammar with one exception: no indirect left-recursion. Handles e : e '*' e | INT ; rules file antlr.org/papers/allstar-techreport.pdf ALL(*) is my final word on parsing. 25 years and I finally cracked it. Took the C11 spec out of the box complete with left-recursion, albeit a slow linear pace for parsing w/o tweaking the grammar. Java grammar parsers 12,920 files, 3.6M lines, size 123M of Java library in < 6secs. See the tool shootout in paper. Needed either log scale or separate graph for GLR tools.Stocktaking
C
13

If you have to hand code one, recursive descent (LL) is something you can do realistically; people cannot hand-build L(AL)R parsers practically by hand.

Given that modern parser generators will handle all the parser construction for you, and that space is not much of an issue, I prefer LR parsers because you don't have to fight with the grammars as much to make them valid for your particular parser generator (no "remove all the left recursion" silliness).

In fact, I prefer GLR parsers, which will pretty much parse anything with a context free grammar. No left-recursion worries. No shift/reduce conflict worries. No lookahead limits.

If you want to see the range of languages that one GLR parsing engine can handle (including the famously hard-to-parse-using-LL/LALR language, C++), you can look here.

Cirilo answered 3/11, 2010 at 22:47 Comment(16)
you don't have to fight versus grammars just because you are able to define predences directly by specific commands inside the parser definition, not because the LR approach has less requirements over the grammar itself..Preheat
@Ira: Very interesting post! I hadn't even heard of GLR parsers before.Greybeard
@Jack: I agree with your point about how precedence directives are not a LR-specific advantage. However, would you agree with Ira's point regarding the elimination of left recursion?Greybeard
Yes, I agree since LL parser don't allow it. But left recursion can be solved in a very straightforward way while some reduce-reduce or shift-reduce conflicts require lot of thinking and refactoring of the grammar itself (which becomes hard when the grammar grows).Preheat
@Jack: Fair enough. Correct me if I'm wrong, but don't the shift/reduce and reduce/reduce conflicts effectively disappear if the LR parser allows for arbitrary numbers of lookahead symbols (that is, an LR(k) parser)? Granted, most LR parser generators limit themselves to 1 lookahead symbol, so I don't have much experience along those lines.Greybeard
Actually LALR does exist just because taking care about look-ahead in LR parsers is really expensive. Think about the fact that every single look-ahead increases complexity by an order of magnitude IIRC. That's why in practical situations you won't use a LR(3) parser..Preheat
LALR exists because you get smaller tables than LR for many practical grammars. You can certainly define the concept of LALR(3) grammars although nobody does in practice. The reason for GLR is that you can stop thinking about K.Cirilo
@Jack: In general, that's true. However, I'm surprised that more LR parser generators don't support LR(k), considering solutions to the exponential space requirements appear to have been available since 1991.Greybeard
@Adam: We all learned something today. Our GLR parser uses essentially an LALR(1) parser generator as a foundation and the 1-unit lookahead sets actually help to avoid GLR parse-splits in practice, keeping its overhead down. (They are bit slower than LALR because there's extra bookeeping [which you can eliminate a lot of if you try hard]. The LALR(k) paper might let us peek ahead occasionally a bit further and avoid splits; into my think-about-this bin. Thanks for the pointer.Cirilo
@Ira: Very good point. Is there freely available documentation on GLR parsers that I could read?Greybeard
@Adam: The references at the Wikipedia sites are must-reads but they are hard to find "free"; I have hardcopies acquired many years ago. Adrian Johnstone seems to be doing a lot of work on advanced GLR parsing, for example: portal.acm.org/citation.cfm?id=1149674. This is free if you are an ACM member. His website cs.rhul.ac.uk/research/languages/publications/… likely has freely accessible documents.Cirilo
@Ira: Thanks for the tip! I'm also glad that the link may be of use to you! I would have guessed that it would have been better known in the compiler community after almost 20 years (I only noticed it because the Eclipse foundation uses it in their Java parser).Greybeard
@Ira: You may wish to post your sources on the GLR parsing algorithm resources question.Greybeard
I really hope this doesn't come off as hostile -- it's an honest question -- when is left recursion useful/necessary in practice? I don't mean to imply that it's not useful/necessary, just that I have never seen any uses of left recursion that couldn't be replaced by a * or + quantifier. I'm sure there are valid uses, but I'm not familiar with any of them.Jester
* or + are not standard BNF rules. They are standard in many extended BNF rules. If your parser generator doesn't have these, then you need left or right recursion. If you had them, they correspond to left or right recursion. If you generate a parser from the grammar, often EBNF is essentialy reduced to BNF (certainly for LR-class parsers) and you want left recursion because it makes the parser more efficient; with right recursion, for such parsers you have to have a stack as deep as the list length. ...Cirilo
Parsers that are implemented top-down can convert kleene operators from logically right recursion into looping operations. But people can still write rules that involve left recursion in complicated ways; I don't have an example off hand but I'm a lot happier knowing I just don't have worry about how I write my grammar rules. If you look at the complaints people have about top-down parsers, even those with kleene star notation, you'll almost always find them trying to find a way to get rid of some inconvenient left recursion. Why have the worry at all?Cirilo
P
3

From my personal experience (I used both for various situations), the most practical difference is that, with a LL(k), you can define the grammar in an easier way (since it's top-down) without caring about many possible reduce-reduce or shift-reduce conflicts which often occur with LR parsers. The only thing you have to care about is left-recursion which must be transformed into right one.

Another thing is that the top-down approach usually implies a higher complexity (regarding either space or time), because it has to store the whole tree while parsing and it can grow a lot until ambiguities are solved.

Preheat answered 3/11, 2010 at 22:46 Comment(1)
Just so I understand... You're saying the most practical difference is that LL parsers avoid the conflicts typically found in LR parsers. Would this only apply to LR parsers that limit their lookahead to a small number (for example, LR(1) or LALR(1))? I thought that LR(k) parsers didn't have near as many conflicts...Greybeard
M
1

The only advantage I've ever been familiar with is that you can easily code LL parsers by hand. LR parsers are MUCH harder to code by hand (you usually use a parser generator).

Makebelieve answered 4/11, 2010 at 21:5 Comment(1)
And when you have such a genarator, producting them is really easy.Cirilo
F
1

LL Parsings worst case complexity is O(n^4), whereas LR parsing's worst case complexity is better, O(n^3).

(But no one would ever write O(n^4) grammar.)

https://en.wikipedia.org/wiki/Top-down_parsing

Fool answered 14/2, 2018 at 23:21 Comment(0)
S
1

According to Laurence Tratt, LL parsers have a small but important niche, which is if you need:

the highest possible performance and/or the best possible error messages.

And hand code a recursive descent parser to accomplish that.

For a realistic programming language grammar, it will typically take many person months of effort to beat an automatically generated parser.

However:

LL parsers are largely unappealing, because the lack of left recursion makes expressing many standard programming language constructs awkward.

and thus his conclusion is that LR parsing, which handles left recursion just fine, is the way to go.

For a more thorough review of this I recommend Laurence Tratt's excellent essay Which Parsing Approach?

Sharp answered 18/1, 2022 at 9:22 Comment(0)
E
0

One reason that comes to mind is, it's much easier to do a language that needs arbitrary backtracking (cough C++) in an LL paradigm.

Entrails answered 3/11, 2010 at 22:44 Comment(6)
Cough C++ does NOT need arbitrary backtracking. A GLR parser does this just fine. See my answer.Cirilo
I mean, C++ isn't LL(k) or LR(k) for any finite k, because of the declaration/expression ambiguity.Entrails
Well, parsing C++ in a pure sense (not the GCC-resolves-types-during-parsing hack) requires the parser also pick up ambiguous parses. LL/LR parses simply don't do that. GLR parsers do. You resolve the ambiguity later with symbol table information, allowing you to separate parsing from name and type resolution, which gives you much cleaner parsers.Cirilo
@Zack and @Ira: What an interesting conversation!Greybeard
I am not much on the theory, but I've worked on both the GCC and the EDG front ends for C++, and I was under the impression that it was not practical to avoid what you describe as the "resolves types during parsing hack", period. Of course, both of those are hand coded recursive descent parsers with backtracking, so maybe it's just a consequence of the implementation technique. Has anyone coded a full GLR parser for C++?Entrails
@Zack: Read my answer. Check out the link. YES. We avoid that hack completely. We use it to carry out modifications of large-scale C++ systems.Cirilo

© 2022 - 2024 — McMap. All rights reserved.