LR(1) Item DFA - Computing Lookaheads

Asked 31/12, 2012 at 15:12 Answered 29/10, 2015 at 7:33

Solved parsing context-free-grammar regex-lookarounds dfa lr-grammar

I have trouble understanding how to compute the lookaheads for the LR(1)-items.

Lets say that I have this grammar:

S -> AB
A -> aAb | a
B -> d

A LR(1)-item is an LR(0) item with a lookahead. So we will get the following LR(0)-item for state 0:

S -> .AB , {lookahead} 
A -> .aAb,  {lookahead} 
A -> .a,  {lookahead}

State: 1

A ->  a.Ab, {lookahead} 
A ->  a. ,{lookahead} 
A -> .aAb ,{lookahead} 
A ->.a ,{lookahead}

Can somebody explain how to compute the lookaheads ? What is the general approach ?

Thank you in advance

Airdrome answered 31/12, 2012 at 15:12 Comment(0)

The lookaheads used in an LR(1) parser are computed as follows. First, the start state has an item of the form

S -> .w  ($)

for every production S -> w, where S is the start symbol. Here, the $ marker denotes the end of the input.

Next, for any state that contains an item of the form A -> x.By (t), where x is an arbitrary string of terminals and nonterminals and B is a nonterminal, you add an item of the form B -> .w (s) for every production B -> w and for every terminal in the set FIRST(yt). (Here, FIRST refers to FIRST sets, which are usually introduced when talking about LL parsers. If you haven't seen them before, I would take a few minutes to look over those lecture notes).

Let's try this out on your grammar. We start off by creating an item set containing

S -> .AB ($)

Next, using our second rule, for every production of A, we add in a new item corresponding to that production and with lookaheads of every terminal in FIRST(B$). Since B always produces the string d, FIRST(B$) = d, so all of the productions we introduce will have lookahead d. This gives

S -> .AB ($)
A -> .aAb (d)
A -> .a (d)

Now, let's build the state corresponding to seeing an 'a' in this initial state. We start by moving the dot over one step for each production that starts with a:

A -> a.Ab (d)
A -> a. (d)

Now, since the first item has a dot before a nonterminal, we use our rule to add one item for each production of A, giving those items lookahead FIRST(bd) = b. This gives

A -> a.Ab (d)
A -> a. (d)
A -> .aAb (b)
A -> .a (b)

Continuing this process will ultimately construct all the LR(1) states for this LR(1) parser. This is shown here:

[0]
S -> .AB  ($)
A -> .aAb (d)
A -> .a   (d)

[1]
A -> a.Ab (d)
A -> a.   (d)
A -> .aAb (b)
A -> .a   (b)

[2]
A -> a.Ab (b)
A -> a.   (b)
A -> .aAb (b)
A -> .a   (b)

[3]
A -> aA.b (d)

[4]
A -> aAb. (d)

[5]
S -> A.B  ($)
B -> .d   ($)

[6]
B -> d.   ($)

[7]
S -> AB.  ($)

[8]
A -> aA.b (b)

[9]
A -> aAb. (b)

In case it helps, I taught a compilers course last summer and have all the lecture slides available online. The slides on bottom-up parsing should cover all of the details of LR parsing and parse table construction, and I hope that you find them useful!

Hope this helps!

Frech answered 2/1, 2013 at 18:35 Comment(14)

Thank you. Can you explain why $ is included in the lookahead sets in the following grammar, but it's not in FIRST($A) ? S → •A {$} A → • AA {$, b} A → • bc {$ ,b } – Airdrome 2/1, 2013 at 23:47

@mrjasmin- I need to see more of the grammar to know what the FIRST and lookahead sets should be; can you post more? Also, note that you shouldn't be computing FIRST($A) anywhere. If you have A -> .AA ($), the lookaheads for the resulting items would be the terminals in FIRST(A$), not FIRST($A). Does that help at all? – Frech 2/1, 2013 at 23:57

Hi ! This question is from an exam and all that is given is: S -> .A {$} A -> .AA { } A -> .bc { } The student is supposed to find the lookahead set- And the answer is the post above. I don't understand how $ is a lookahead – Airdrome 3/1, 2013 at 0:1

@mrjasmin- The initial $ probably comes from the fact that S is the start symbol, so its production is always marked with a $ after the fact. The production A -> .AA would therefore initially have $ as a lookahead, as would A -> bc. Next, since A -> .AA ($) is an item, you'd add in new items for each production of A, with lookaheads FIRST(A$). Since A -> bc is a production of A, the only element of FIRST(A$) is b. Thus you'd add A -> .AA (b) and A -> .bc (b) to the item set. Merging these with A -> .AA ($) and A -> .bc ($) gives A -> .AA ($, b) and A -> .bc ($, b). Does that make sense? – Frech 3/1, 2013 at 0:10

Why would the production A -> .AA initially have $ as a lookahead ? Shouldn't A -> aAb | a also have $ initially then ? Thanks – Airdrome 3/1, 2013 at 0:19

@mrjasmin- No, those two shouldn't be followed by $. The production S -> AB means that the items for the A productions start with FIRST(B$), which is just d. The reason for the $ lookaheads in this second case is that the production S -> A has nothing after the A, so the lookahead is FIRST($), which is just $. – Frech 3/1, 2013 at 3:30

3 more states are needed, as noted below. – Gibby 29/10, 2015 at 7:34

@Gibby I missed two of those states the first time around - thanks for spotting that! As for the third - it looks like you've augmented the grammar by adding in a production S' -> S while I haven't done that, so I think that third state depends on whether you do that or not. – Frech 29/10, 2015 at 18:2

Yes it is augmented, so yes, would be one less state. :-) – Gibby 30/10, 2015 at 2:38

Your link, and every other LALR parsing site on the web appears to mention the use of follow sets not first sets in creation of the item set, are you mistaken in your usage of first sets? web.stanford.edu/class/archive/cs/cs143/cs143.1128/handouts/… and web.cs.dal.ca/~sjackson/lalr1.html – Smile 3/10, 2017 at 17:59

@snb Generating a LALR(1) parser typically does use FOLLOW sets. However, this question is about LR(1) parsing, and typically LR(1) parsers are created using FIRST rather than FOLLOW sets. – Frech 3/10, 2017 at 18:7

@Frech Ah ok, I went backwards in learning about LR parsing. – Smile 3/10, 2017 at 21:6

@snb No worries! LR parsing is often taught in the order of LR(0), then SLR(1), then LR(1), then LALR(1), though you sometimes see LALR(1) and LR(1) interchanged with one another. It can take some practice to get the hang of them! – Frech 3/10, 2017 at 21:29

@Frech parasol.tamu.edu/~rwerger/Courses/434/lec10.pdf for this tell why E->.T+E,$ – Blackleg 10/3, 2018 at 16:55

here is the LR(1) automaton for the grammar as the follow has been done above I think it's better for the understanding to trying draw the automaton and the flow will make the idea of the lookaheads clearer

here is the automaton for the grammar

Sada answered 14/3, 2015 at 6:55 Comment(0)

The LR(1) item set constructed by you should have two more items.

I8 A--> aA.b , b from I2

I9 A--> aAb. , b from I8

Blueberry answered 3/4, 2013 at 6:39 Comment(0)

I also get 11 states, not 8:

State 0
        S: .A B ["$"]
        A: .a A b ["d"]
        A: .a ["d"]
    Transitions
        S -> 1
        A -> 2
        a -> 5
    Reductions
        none
State 1
        S_Prime: S .$ ["$"]
    Transitions
        none
    Reductions
        none
State 2
        S: A .B ["$"]
        B: .d ["$"]
    Transitions
        B -> 3
        d -> 4
    Reductions
        none
State 3
        S: A B .["$"]
    Transitions
        none
    Reductions
        $ => S: A B .
State 4
        B: d .["$"]
    Transitions
        none
    Reductions
        $ => B: d .
State 5
        A: a .A b ["d"]
        A: .a A b ["b"]
        A: .a ["b"]
        A: a .["d"]
    Transitions
        A -> 6
        a -> 8
    Reductions
        d => A: a .
State 6
        A: a A .b ["d"]
    Transitions
        b -> 7
    Reductions
        none
State 7
        A: a A b .["d"]
    Transitions
        none
    Reductions
        d => A: a A b .
State 8
        A: a .A b ["b"]
        A: .a A b ["b"]
        A: .a ["b"]
        A: a .["b"]
    Transitions
        A -> 9
        a -> 8
    Reductions
        b => A: a .
State 9
        A: a A .b ["b"]
    Transitions
        b -> 10
    Reductions
        none
State 10
        A: a A b .["b"]
    Transitions
        none
    Reductions
        b => A: a A b .

Gibby answered 29/10, 2015 at 7:33 Comment(0)

Recommended topics

Hot tags