The Haskell Report includes a somewhat notorious clause in the layout rules called "parse-error(t)". The purpose of this rule is to avoid forcing the programmer to write braces in single-line let
expressions and similar situations. The relevant sentence is:
The side condition parse-error(t) is to be interpreted as follows: if the tokens generated so far by L together with the next token t represent an invalid prefix of the Haskell grammar, and the tokens generated so far by L followed by the token “}” represent a valid prefix of the Haskell grammar, then parse-error(t) is true.
This creates an unusual dependency where the lexer necessarily both produces tokens for the parser and responds to errors produced in the parser by inserting additional tokens for the parser to consume. This is unlike pretty much anything you'll find in any other language definition, and severely complicates the implementation if it is interpreted 100% literally.
Unsurprisingly, no Haskell compiler that I'm aware of implements the entire rule as written. For example, GHC fails to parse the following expression, which is legal according to the report:
let x = 42 in x == 42 == True
There are a wide variety of other similar strange cases. This post has a list of some especially difficult examples. Some of these GHC works correctly on, but it also (as of 7.10.1) fails on this one:
e = case 1 of 1 -> 1 :: Int + 1
Also, it seems GHC has an undocumented language extension called AlternativeLayoutRule
that replaces the parse-error(t) clause with a stack of token contexts in the lexer that gives similar results in most cases; however, this is not the default behavior.
What do real-world Haskell compilers (including GHC in particular) do to approximate the parse-error(t) rule during lexing? I'm curious because I'm trying to implement a simple Haskell compiler and this rule is really tripping me up. (See also this related question.)
+
legal inside a type... – Twigcase 1 of 1 -> 1 :: Int :: Int
parses perfectly. – Twig