How does a Haskell compiler work?
Asked Answered
B

6

21

Where can I get some paper/doc/whatever which describes how a Haskell compiler actually works? I read quite a few of the docs of GHC, but stopped after getting a headache. So, something which doesn't require a PhD to understand it and isn't written in the You're-supposed-to-be-already-familiar-with-it style would be preferable. It's not a problem if it's really long and takes some time to understand it though.

PS: Most interesting would be something about GHC, but anything is ok.

Baptista answered 8/12, 2010 at 7:34 Comment(9)
Excellent question. I'd like to know particularly whether CPS transform a la scheme is used or not. I strongly believe it is "the" way to implement functional languages, but I may be overlooking the difficulties.Imidazole
If you want a grasp on what CPS is, the excellent "lambda the ultimate goto" paper from the scheme author is a great read.Imidazole
Not exactly what you're asking for, but I'd recommend looking at compilers other than GHC to start. JHC's source is extremely readable, and the UHC code has a lot of good theoretical documentation; either of these would be easier going than GHC.Vittoria
@John: Good point, I already tried but found out, that you're totally lost as soon as you don't really understand what the programer tries to do.Baptista
@Alexandre C. Where can I get it?Baptista
@FUZxxi: you didn't even try google, did you ?Imidazole
@Alexandre C: FOr the question it self, I did. But not for your comment. I should probably do.Baptista
The question was answered, but not some comments. To set the record straight: no, @AlexandreC, a CPS transformation is not involved, instead, lambda lifting is used to transform the program to a set of supercombinators to make compiled graph reduction possible. It is possible to argue that there is an equivalence between CPS and the STG. And Scheme, today, counts as an imperative language. No prior PhD is necessary, but the full answer to the question is more work than a PhD (not original, though).Mould
One of the best papers on that subject that I have read is: The Glasgow Haskell Compiler by Simon Marlow and Simon Peyton-Jones. The Architecture of Open Source Applications. aosabook.org/en/ghc.htmlMarras
E
32

You can get an answer from the horse's mouth! Simon Peyton Jones (GHC wizard) wrote a book explaining how to implement functional programming languages. It's available for free online since it's now out of print: http://research.microsoft.com/en-us/um/people/simonpj/papers/pj-lester-book/

Of course, GHC has moved on since the book was written, but it's still very relevant.

Egomania answered 8/12, 2010 at 8:30 Comment(2)
And don't forget the stg paper -- "Implementing Lazy Functional Languages on Stock Hardware": research.microsoft.com/apps/pubs/default.aspx?id=67083Evanish
And the videos linked on the GHC developers' wiki.Paramatta
S
17

Are you looking for details especially about compiling lazy-evaluation? There is Simon Peyton-Jones's book mentioned by Max Bolingbroke, also the book detailing Clean's implementation is online:

http://wiki.clean.cs.ru.nl/Functional_Programming_and_Parallel_Graph_Rewriting

If you have a university affiliation and want something smaller you could try to get these books (Henderson & Diller are certainly out of print):

Antoni Diller "Compiling Function Languages" ISBN 0 471 92027 4

Peter Henderson "Functional Programming Application and Implementation" ISBN 0-13-331579-7

AJT Davie "An Introduction to Functional Programming Systems using Haskell" ISBN 0 521 27724 8

Diller has a full compiler for a lazy language (implemented in Pascal) via combinator reduction. This was the implementation technique invented by David Turner for SASL. Henderson has many parts of a compiler for LISPkit a miniature, lazy variant of Lisp. Davie details quite a bit of the machinery for compiling a lazy language, for instance there's a description of the STG thats much shorter than Simon Peyton-Jones's book (the STG is the abstract machine SPJ used for Haskell).

The Clean developers have quite a bit of info on implementing SAPL (a Simple Applicative Language) if you look through their publications list:

https://clean.cs.ru.nl/Publications

Finally there are quite a number of papers documenting aspects of the Utrecht Haskell Compiler UHC (and EHC). I think most of the information is how the compiler is organized (with attribute grammars and "Shuffle") and how the type systems (there are various levels of type system in EHC) are implemented, rather than how the back-end 'compilation' works.

Septuor answered 8/12, 2010 at 9:40 Comment(0)
H
5

Compilers are a huge subject and it would be impossible to explain them in entirety here. But here's an overview for a general compiler. Hopefully this will give you some understanding that may make reading things specifically about GHC a little easier to understand.

Compilers generally work by a series of transformations in 2 parts, the front-end and back-end.

The first transformation is turning plain text into something a little easier to traverse. This itself is usually split into 2 parts:

Lexical Analysis or Tokenization - The act of transforming plain text into small chunks (typically operators, identifiers, literals etc).

Syntactic Analysis or Parsing - Turning these small chunks into a tree structure. (typically an AST, an Abstract Syntax Tree)

The next stage is semantic analysis. In this stage a compiler will usually add information to the AST (like type information) and build a symbol table. That concludes the front-end.

The next transformation transforms the AST into an IR, an Intermediate Representation. This is generally, nowadays an SSA form, a Single Static Assignment.

This is then optimized, via Constant Propagation, Dead code analysis, Vectorisation etc.

The last transformation is code generation. Transforming the IR into machine code. This can be very complicated. It is also sometimes referred to as lowering.

For more information I recommend this wikipedia page.

Highflown answered 8/12, 2010 at 8:5 Comment(1)
Thank you for your comment. Please notice that I asked specially for Haskell, where these steps are a bit more sophisticated, eg. Haskell allows new operators with arbitrary precedence which turn parsing into a guesswork, type-inference and classes, which IMHO are making the analysis of what going on quite a bit challenging and a big bunch of optimizations just to get rid of laziness. But thanks for your clear answer!Baptista
L
4

Unfortunately, I suspect that what you're looking for doesn't exist. Compiler theory and Formal Language theory are reasonably complex topics in Computer Science, and Haskell is by no means a starting point.

First, you should probably get a good grounding in:

I would suspect anything explaining anything about the internals of Haskell would require a substantially better understanding of the above topics than say, C would.

I've taken a single course on the subject so far, so I have no formal literature to recommend, but I'm sure there exist many good sources.

Latticed answered 8/12, 2010 at 7:41 Comment(5)
I guess you're right. It's just that I want to understand what's going on there. IMHO it's easier to understand what your program is doing, if you actually know how it gets compiled.Baptista
You answer isn't very helpful. Haskell is so different from mainstream languages.Dysgraphia
@FUZxxl, I sympathise with your point of view; with procedural languages this is very much the case, and especially something like C. However with Haskell the distance from language to machine is much greater, so you spend much more time thinking in terms of the language model. The only time this isn't true is when thinking about performance. However understanding the whole compiler isn't a lot of use. Learn about thunks, strictness analysis and the intermediate code emitted by GHC.Heidi
@Paul Johnson Isn't thunks, strictness analysis and itermediate code one of the majot parts of the compiler? Probably I should have to ask for this instead of about the whole beast itself.Baptista
What I mean is, you need to learn the concepts rather than the algorithms. Its like, the way to understand square roots is to understand that sqrt(x)^2=x, not by reading about Newton-RaphsonHeidi
S
2

Wikipedia has a good - readable - overview of the internals of the GHC (similar to @dan_waterworth's explanation, but specific to Haskell and the GHC):

http://en.wikipedia.org/wiki/Glasgow_Haskell_Compiler#Architecture

Salerno answered 18/5, 2011 at 17:51 Comment(0)
M
0

One of the best papers on that subject that I have read is:

Marras answered 13/6, 2019 at 10:38 Comment(3)
Link-only answers are discouraged on this site. If you just want to point out a resource, please make a comment instead.Baptista
Okay, sorry! Just wanted to help with a good article on that question. I have just added comment with content of this answer.Marras
Cool! Thank you for your cooperation.Baptista

© 2022 - 2024 — McMap. All rights reserved.