Common Lisp: A good way to represent grammar rules?
Asked Answered
P

1

7

This is a Common Lisp data representation question.

What is a good way to represent grammars? By "good" I mean a representation that is simple, easy to understand, and I can operate on the representation without a lot of fuss. The representation doesn't have to be particularly efficient; the other properties (simple, understandable, process-able) are more important to me.

Here is a sample grammar:

Session → Facts Question
Session → ( Session ) Session
Facts → Fact Facts
Facts → ε
Fact → ! STRING
Question → ? STRING

The representation should allow the code that operates on the representation to readily distinguish between terminal symbols and non-terminal symbols.

Non-terminal symbols: Session, Facts, Fact, Question

Terminal symbols: (, ), ε, !, ?

This particular grammar uses parentheses symbols, which conflicts with Common Lisp's use of parentheses symbols. What's a good way to handle that?

I want my code to be able to be able to recognize the symbol for the empty string, ε. What's a good way to represent the symbol for the empty string, ε?

I want my code to be able to distinguish between the left-hand side and the right-hand side of a grammar rule.

Below are some common operations that I want to perform on the representation.

Consider this rule:

A → u1u2...un

Operations: I want to get the first symbol of a grammar rule's right-hand side. Then I want to know: is it a terminal symbol? Is it the ε-symbol? If it's a non-terminal symbol, then I want to get its grammar rule.

Putrescine answered 23/3, 2016 at 22:1 Comment(3)
I've done something similar ages ago for uni, you might find these interesting: backtracking parser, recursive descent LL(1), LALR(1)Tumbler
Thank you! So you represented a grammar as an association list. The a-list's key represents the LHS of a grammar rule, the key's value represents the RHS. Terminal symbols are represented by strings, non-terminal symbols by atoms. Is that correct? In hindsight, were you happy with your representation? Would you do things differently if you were to do it over? How did you represent the empty symbol? What does the caret symbol denote, e.g., E^Putrescine
The caret in E^ just stands for E'. In retrospect, I should have used E* since the asterisk seems to be the usual modifier character in Lisp-land. I can't give you much evaluation on these three parsers since I've never used them in anger, this was just for self-educational purposes while learning about various parser algorithms in university.Tumbler
K
2

GRAIL (GRAmmar In Lisp)

I'm including the BNF of GRAIL from the second link in case it expires:

<grail-list>  ::= "'(" {<grail-rule>} ")"
<grail-rule>  ::= <assignment> | <alternation>
<assignment>  ::= "(" <type> " ::= " <s-exp> ")"
<alternation> ::= "(" <type> " ::= " <type> {<type>} ")"
<s-exp>       ::= <symbol> | <nonterminal> | "(" {<s-exp>} ")"
<type>        ::= "#(" <type-name> ")"
<nonterminal> ::= "#(" {<arg-name> " "} <type-name> ")"
<type-name>   ::= <symbol>
<arg-name>    ::= <symbol>

DCG Format (Definite Clause Grammar)

There is an implementation of a definite clause grammar in Paradigms of Artificial Intelligence Programming. Technically it's Prolog, but it's all implemented as Lisp in the book.

Hope this helps!

Kami answered 1/2, 2017 at 16:10 Comment(1)
New link to the PAIP code is github.com/norvig/paip-lisp/blob/master/lisp/grammar.lispSlacker

© 2022 - 2024 — McMap. All rights reserved.