Haskell - How to best to represent a programming language's grammar?
Asked Answered
M

5

21

I've been looking at Haskell and I'd quite like to write a compiler in it (as a learning exercise), since a lot of its innate features can be readily applied to a compiler (particularly a recursive descent compiler).

What I can't quite get my head around is how to represent a language's grammar in a Haskell-ian way. My first thought was to use recursive data type definitions, but I can't see how I use them to match against keywords in the language ("if") for example.

Thoughts and suggestions greatly appreciated,

Pete

Malorie answered 25/3, 2009 at 0:40 Comment(1)
With what little I know, I'd recommend looking at en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours (it uses Parsec etc.) but I'm not leaving it as an answer because it's quite possibly a terrible one. Hopefully someone who knows better will answer (and then I'll delete this comment).Gummous
W
17

A recursive data type is fine for this. For example, given the language:

expr ::= var
      |  "true"
      |  "false"
      |  "if" expr "then" expr "else" expr
      |  "(" expr ")"

an example expression in this language would be:

if true then x else (if false then y else true)

Your Haskell data type would look something like this:

data Expr = Var String
          | Lit Bool
          | If Expr Expr Expr

Your parser then takes care to translate, e.g., x into Var "x", and true into Lit True, etc. I.e.:

parse "if x then false else true" 
  ==  If (Var "x") (Lit False) (Lit True)

For writing parsers you can roll your own using the techniques mentioned in Norman's answer, or using Parsec or use parser generators like Happy.

Walking answered 25/3, 2009 at 2:14 Comment(0)
F
20

You represent programs using mutually recursive algebraic data types, and to parse programs you use parsing combinators. There are a million flavors; you will find three helpful tutorial papers on the schedule for my class for Monday, March 23, 2009. They are

The Hutton and Meijer paper is the shortest and simplest, but it uses monads, which are not obvious to the amateur. However they have a very nice grammar of and parser for expressions. If you don't grok monads yet, Fokker's tutorial is the one.

Featureless answered 25/3, 2009 at 1:25 Comment(4)
Awesome..links. You are the Prog language guru around here. ThanksMackey
Ironically, I think learning about parser combinators is the best way to understand monads.Whitening
The problem with parser combinators is they do not usually support left recursive grammars...Eldwin
@CallumRogers the same is true of handwritten recursive-descent parsers. Niklaus Wirth showed the way years ago: instead of left recursion, use sequences.Featureless
W
17

A recursive data type is fine for this. For example, given the language:

expr ::= var
      |  "true"
      |  "false"
      |  "if" expr "then" expr "else" expr
      |  "(" expr ")"

an example expression in this language would be:

if true then x else (if false then y else true)

Your Haskell data type would look something like this:

data Expr = Var String
          | Lit Bool
          | If Expr Expr Expr

Your parser then takes care to translate, e.g., x into Var "x", and true into Lit True, etc. I.e.:

parse "if x then false else true" 
  ==  If (Var "x") (Lit False) (Lit True)

For writing parsers you can roll your own using the techniques mentioned in Norman's answer, or using Parsec or use parser generators like Happy.

Walking answered 25/3, 2009 at 2:14 Comment(0)
F
3

Maybe you can look at some real-world projects to see how they do it?

Less than a week ago, the Language-Python project was announced on the Haskell-Cafe mailinglist. It's a Python parser implemented in Haskell, using the Happy parser generator and Alex lexer generator.

And of course, there is Pugs, an implementation of Perl 6 in Haskell (the first implementation of Perl 6 that conforms to a significant subset of the Perl 6 specification).

Fick answered 25/3, 2009 at 14:31 Comment(0)
G
0

I can't tell from the tone of your question whether this is the first time you're attempting to write a compiler, or if you've written compilers before and are looking for advice specific to Haskell. If you're already a compiler guru, what little advice I have to offer isn't going to help. :)

Programming language grammars are commonly represented in BNF form, which can be used by tools like Yacc or Bison to parse source code. I don't know if this counts as a Haskell-ian way to do it, but it's the only way that I've heard of. With some digging around you can probably dig up a tool to generate Haskell code from a BNF grammar; I found this tool which claims to be able to do that.

A quick Google search turned up this BNF grammar for Haskell, and there are probably others out there, in case you want to write a compiler for Haskell (maybe you'd like to write a Haskell compiler in Haskell?) BNF grammars for C and Java seem to be popular.

Finally, if you're looking for a book about compiler design, the classic text is "The Dragon Book".

Grishilda answered 25/3, 2009 at 1:0 Comment(3)
The Dragon Book is indeed a classic, but I've heard that current thinking about compiler implementation has passed it by (e.g., JIT runtime optimizations, etc). I'm no authority myself, just a shameless passer of innuendo. 8)Donn
Thanks for your response, I'm looking for advice more specific to Haskell, and I'd rather not use a tool (I'm trying to learn not have a tool do it all for me :) ).Malorie
Writing compilers in Haskell is quite a different (and more pleasant) exercise compared to writing compilers in imperative languages. [And @duffymo: 'innuendo' probably doesn't mean what you think it means :)]Gummous
D
0

Unfortunately there's no Haskell grammar for ANTLR, but perhaps you can use the link cited above to create one. ANTLR is a great Java-based parser generator.

Steve Yegge has a nice blog about writing compilers, if you need more motivation. It's entertaining.

Donn answered 25/3, 2009 at 1:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.