How to find FIRST and FOLLOW sets of a recursive grammar?
Asked Answered
G

1

12

Suppose I have the following CFG.

A -> B | Cx | EPSILON
B -> C | yA
C -> B | w | z

Now if I try to find

FIRST(C) = FIRST(B) U FIRST(w) U FIRST(z)
         = FIRST(C) U FIRST(yA) U {w, z}

That is, I'm going in a loop. Thus I assume I have to convert it into a form which has immediate left recursion, which I can do as follows.

A -> B | Cx | EPSILON
B -> C | yA
C -> C | yA | w | z

Now if I try to calculate FIRST sets, I think I can get it done as follows.

FIRST(C) = FIRST(C) U FIRST(yA) U FIRST(w) U FIRST(z)
         = { y, w, z } // I ignore FIRST(C)
FIRST(B) = FIRST(C) U FIRST(yA)
         = { y, w, z }
FIRST(A) = FIRST(B) U FIRST(Cx) U FIRST(EPSILON)
         = { y, w, z, EPSILON }

Am I correct there?

But even if I'm right there, I still run into a problem when I try to calculate FOLLOW sets from this grammar.

FOLLOW(A) = { $ } U FOLLOW(B) U FOLLOW(C)

I get FOLLOW(B) from 2nd rule and FOLLOW(C) from 3rd rule. But now to calculate FOLLOW(B), I need FOLLOW(A) (from 1st grammar rule) so again I'm stuck in a loop.

Any help? Thanks in advance!

Gastineau answered 22/3, 2015 at 17:9 Comment(0)
A
15

Since FIRST and FOLLOW are (normally) recursive, it's useful to think of them as systems of equations to be solved; the solution can be achieved using a simple incremental algorithm consisting of repeatedly applying all the right hand sides until no set has changed during a cycle.

So let's take the FOLLOW relation for the given grammar:

A → B | Cx | ε
B → C | yA
C → B | w | z

We can directly derive the equations:

FOLLOW(A) = FOLLOW(B) ∪ {$}
FOLLOW(B) = FOLLOW(A) ∪ FOLLOW(C)
FOLLOW(C) = FOLLOW(B) ∪ {x}

So we initially set all the follow sets to {}, and proceed.

First round:

FOLLOW(A) = {} ∪ {$} = {$}
FOLLOW(B) = {$} ∪ {} = {$}
FOLLOW(C) = {$} U {x} = {$,x}

Second round:

FOLLOW(A) = {$} ∪ {$} = {$}
FOLLOW(B) = {$} ∪ {$,x} = {$,x}
FOLLOW(C) = {$,x} U {x} = {$,x}

Third round:

FOLLOW(A) = {$,x} ∪ {$} = {$,x}
FOLLOW(B) = {$} ∪ {$,x} = {$,x}
FOLLOW(C) = {$,x} U {x} = {$,x}

Fourth round:

FOLLOW(A) = {$,x} ∪ {$} = {$,x}
FOLLOW(B) = {$,x} ∪ {$,x} = {$,x}
FOLLOW(C) = {$,x} U {x} = {$,x}

Here we stop because no changes were made in the last round.

This algorithm must terminate because there are a finite number of symbols, and each round can only add symbols to steps. It is not the most efficient technique, although it is generally good enough in practice.

Anamorphosis answered 22/3, 2015 at 22:56 Comment(4)
Thanks bro, this helps me a lot!Bohrer
Thanks a lot . Really helped me !Crupper
What IS the "most efficient technique" for calculating FIRST? I think it is to start at the goal symbol and recursively follow each RHS, propagating the FiRST sets up the tree. Loop detection gets tricky, though.Frumpy
@david: The most efficient technique is to compute the image of a transitive closure using some variant of Tarjan's algorithm (see Esko Nuutila, 1994 for some techniques to optimise this algorithm, although I'm sure that there is more recent research as well.) (The image of a transitive closure takes a relation R and a function F and computes R*F; as Nuutila points out, the TC algorithm can be quicker if you find the union of image sets rather than finding the union of node sets and then compute the image.)Anamorphosis

© 2022 - 2024 — McMap. All rights reserved.