This is a Common Lisp data representation question.
What is a good way to represent grammars? By "good" I mean a representation that is simple, easy to understand, and I can operate on the representation without a lot of fuss. The representation doesn't have to be particularly efficient; the other properties (simple, understandable, process-able) are more important to me.
Here is a sample grammar:
Session → Facts Question
Session → ( Session ) Session
Facts → Fact Facts
Facts → ε
Fact → ! STRING
Question → ? STRING
The representation should allow the code that operates on the representation to readily distinguish between terminal symbols and non-terminal symbols.
Non-terminal symbols: Session, Facts, Fact, Question
Terminal symbols: (, ), ε, !, ?
This particular grammar uses parentheses symbols, which conflicts with Common Lisp's use of parentheses symbols. What's a good way to handle that?
I want my code to be able to be able to recognize the symbol for the empty string, ε. What's a good way to represent the symbol for the empty string, ε?
I want my code to be able to distinguish between the left-hand side and the right-hand side of a grammar rule.
Below are some common operations that I want to perform on the representation.
Consider this rule:
A → u1u2...un
Operations: I want to get the first symbol of a grammar rule's right-hand side. Then I want to know: is it a terminal symbol? Is it the ε-symbol? If it's a non-terminal symbol, then I want to get its grammar rule.
E^
just stands forE'
. In retrospect, I should have usedE*
since the asterisk seems to be the usual modifier character in Lisp-land. I can't give you much evaluation on these three parsers since I've never used them in anger, this was just for self-educational purposes while learning about various parser algorithms in university. – Tumbler