If we know a CFG only generates regular language, can we get the corresponding regular expression?

In the most general sense, there is no solution. The problem of determining whether a CFG is regular is undecidable (Greibach Theorem, last 3 pages of http://www.cis.upenn.edu/~jean/gbooks/PCPh04.pdf ) If we could convert CFGs to Regular Expressions, we could use that algorithm on any grammar and use its success/failure to determine whether the language is regular.

So instead, when a CFG is known to produce a regular language, either its language is already known (and therefore directly convertible to a RegEx), or there's some property of the grammar to exploit. Each property has its own algorithm for converting to a RegEx.

For example, if the grammar is right linear, every production is of the form A->bC or A->a. This can be converted to a NFA where:

1) There is a state for every non-terminal, plus an accept state.

2) The start symbol S is the start state.

3) A->bC is a transition from A to B on input b

4) A->a is a transition from A to the accept state on input a.

This NFA can then be converted to a regular expression via state elimination (pages 5-8 of http://www.math.uaa.alaska.edu/~afkjm/cs351/handouts/regular-expressions.pdf ). An analogous process for left-linear grammars would have start and accept states exchanged.

Beyond that, one could exploit closure properties of regular languages. For example, the language in the question is not linear, but it can be written as S->S'b, S'->aA. Now S' is right-linear, and S is the concatenation of two disjoint linear grammars. Concatenate the two expressions for the final expression. Similar logic for union.

Recommended topics

Hot tags