Isn't an LR(0) parser using lookaheads as well?
Asked Answered
S

1

6

An LL(1)-parser needs a lookahead-symbol for being able to decide which production to use. This is the reason why I always thought the term "lookahead" is used, when a parser looks at the next input token without "consuming" it (i.e. it can still be read from the input by the next action). LR(0) parsers, however, made me doubt that this is correct:

Every example of LR(0)-parsers that I've seen also uses the next input token for deciding whether to shift or to reduce. In case of reduction the input token is not consumed.

I used the freeware tool "ParsingEmu" for generating an LR-table and performing an LR evalutation below for the word "aab". As you can see the column head contain tokens. From the evaluation you can see that the parser is deciding which column to use by looking at the next input token. But when the parser reduces in steps 4 - 6 the input doesn't change (although the parser needs to know the next input token "$" when performing a transition to the next state).

Grammar:

S -> A
A -> aA
A -> b

Table: LR table

Evaluation: LR evaluation

Now I made following assumptions for the reason of my confusion:

  1. My assumption for the definition of "lookahead" (lookahead = input token not being consumed) is wrong. Lookahead just means two different things for either LL-parsers or LR-parsers. If so, how can "lookahead" be defined then?

  2. LR-parsers have (from the theoretical point of view when you would use push-down automaton) additional internal states where they consume the input token by putting it on the stack and therefore are able to make the shift- reduce- decision by just looking on the stack.

  3. The evaluation shown above is LR(1). If true, what would an LR(0) evaluation look like?

Now what is correct, 1, 2 or 3 or something completely different?

Sender answered 14/3, 2015 at 13:35 Comment(3)
possible duplicate of How can an LR(0) parser ever leave state 0? (my own question)Chickasaw
could you share the link for Parsing EMU?Simpleton
I'm sorry, it seems that ParsingEmu completely disappeared from the web. I can only find a paper where it is mentioned.Sender
A
7

It's important to be precise:

An LR(k) parser uses the curent parser state and k lookahead symbols to decide whether to reduce, and if so, by which production.

It also uses a shift transition table to decide which parsing state it should move to after shifting the next input token. The shift transition table is keyed by the current state and the (single) token being shifted, regardless of the value of k.

If in a given parser state, it would be possible to produce both a shift and a reduce action, then the parser has a shift/reduce conflict, and it is invalid. Consequently, the above two determinations could in theory be done nondeterministically.

If in a given parser state, no reduce is possible and the next input symbol cannot be shifted (that is, there is no transition for that state with that input symbol), then the parse has failed and the algorithm terminates.

If, on the other hand, the shift transition leads to the designated Accept state, then the parse succeeds and the algorithm terminates.

What all that means is that the lookahead is used to predict which, if any, reduction should be applied. In an LR(0) parser, the decision to shift (more accurately, to attempt to shift) must be made before reading the next input token, but the computation of the state to transition to do is made after reading the token, at which point it will signal an error if no shift is possible.


LL(k) parsers must predict which production replace a non-terminal with as soon as they see the non-terminal. The basic LL algorithm starts with a stack containing [S, $] (top to bottom) and does whichever of the following is applicable until done:

  • If the top of the stack is a non-terminal, replace the top of the stack with one of the productions for that non-terminal, using the next k input symbols to decide which one (without moving the input cursor), and continue.

  • If the top of the stack is a terminal, read the next input token. If it is the same terminal, pop the stack and continue. Otherwise, the parse has failed and the algorithm finishes.

  • If the stack is empty, the parse has succeeded and the algorithm finishes. (We assume that there is a unique EOF-marker $ at the end of the input.)


In both cases, lookahead has the same meaning: it consists of looking at input tokens without moving the input cursor.

If k is 0, then:

  • An LR(k) parser must decide whether or not to reduce without examining input, which means that no state can have either two different reduce actions or a reduce and a shift action.

  • An LL(k) parser must decide which production of a given non-terminal is appicable without examining input. In practice, this means that a each non-terminal can have only one production, which means the language must be finite.

Auspicious answered 14/3, 2015 at 18:52 Comment(10)
Is the parsing shown in the images of my question a LR(1) parser then?Sender
But the grammar would be LR(0)-parsable as in one state it has to either shift (states 0 and 1) or to reduce(states 3 and 4)? It would not be LR(0)-parsable if one state-row contained both shift and reduce actions depending on the column (= token)?Sender
@fishbone: Right. It's LR(0). No state JJ as both shift and reduce actíons, or two reduce actions for different productions.Auspicious
"the decision to shift (or error) must be made before reading the next input token" Why? I am still confused... If we haven't read the next token, how to make sure shift is successful? For example: In a given parser state and no reduce is possible, if the next input token is 't' then it can be shifted, otherwise it can't and report error. So it still need read next token to decide whether shift or not.Stet
@chansey: the parser decides to "shift or error" before it sees the following token. If the shift subsequently fails, an error will be produced. Perhaps I could have phrased that better. The point is that an LR(0) grammar must make the decision to reduce or shift using 0 lookahead symbols. But that doesn't stop it from failing once it does read the next symbol. (Or indeed even before it reads the next symbol; a reduction action can place the parser into a state with no applicable actions, although the algorithm for creating LR(0) machines will never generate such a case.)Auspicious
I can understand "LR(0) grammar must make the decision to reduce using 0 lookahead symbols". For example: We can not use Follow Set to resolve reduce/reduce confliction in LR(0). But my major confusion is about shift. For example: in some particular state of viable prefix DFA, we have only one item: T -> .int, but the next token is ')' instead of 'int', it cause parse error. So we read the next token ')' to make sure parse error. (But LR(0) shouldn't read next token, because it is 0 lookahead)Stet
@chansey: shift consumes a token, so it has to read it :) The LR(0) parser does not need to know which state it will shift into before it reads the token; it only needs to know that it will shift. Once it makes that decision, it's irrevocable; it cannot decide that a reduction made more sense. I suppose this seems unsymmetrical, and it is: the process of parsing is entirely about reduction actions, since the goal is to discover a derivation. Shifting is just a way of saying, "no reduction here, try the next input point." That's also true of the parser state, which is...Auspicious
...an implementation detail, if you like. The observable actions of the parser are "emit a derivation step" and "consume an input token", and the LR(0) parser must de the first before the second. More generally, an LR(k) parser can do at most k consumes before it emits a derivation ending at a consumed token.Auspicious
This answer is the best it helped me clear confusion regarding lookahead. I have just one doubt in LL(0) parser when is the input symbol consumed?Tropaeolin
@kapil: an LL parser has a prediction stack, which is a representation of what it expects to see next. If the top of the prediction stack is a non-terminal, the parser needs to decide which of the non-terminal's productions to predict. It then pops the non-terminal and pushes the production's right-hand side. If the top of the prediction stack is a terminal, the parser consumes a token which must be the same as the predicted token at the top of the stack. If it's the right token, it pops the prediction stack; otherwise it declares an error.Auspicious

© 2022 - 2024 — McMap. All rights reserved.