Are there any LL Parser Generators for Functional Languages such as Haskell or Scala?
Asked Answered
A

6

19

I've noticed a distinct lack of LL parsers that create parsers in functional languages. The ideal find for what I've been looking for without success is something to generate a Haskell parser for an ANTLR-style LL(*) grammar (modulo minor reformatting of the grammar), and was surprised that every last parser generator with a functional language target I found was some kind of LR parser.

I want to transition the parser of this language I'm working on which has functional features from ANTLR to self-host in the language itself, and it would help a lot if I could port to my language something almost surely correct in another functional language (preferably ones I'm familiar with, Haskell and Scala), instead of having to rewrite it entirely from scratch, though in the end I might do this, since the core language is small.

At this point more than even a solution to this is I'm very curious as to why there are no such LL(*) or even LL(k) parser generators, but many LR generators, since LL seems inherently easier.

Adorne answered 31/3, 2011 at 23:37 Comment(11)
Not just a "Parser-Combinator" library? (Sounds like a tooling approach similar to "lex/yacc" is wanted?)Fuge
@pst, yacc is an LR parser, though.Profligate
@Profligate Right, but I mean running a tool on the grammar to create the parser/code vs. coding the grammar in the code such as External DSLs made easy with Scala Parser Combinators or The Magic behind Parser Combinators? I believe LL(*) can be expressed.Fuge
@Profligate One Parser Combinator library for Haskell would be parsec. Also capable of LL(*) I believe.Fuge
@chrysanhy @Profligate @pst, parsec is LL(1) by default, and LL(*) when used with try. So that should fit the bill.Blotchy
(Formally, parsec recognizes any recursively enumerable language, so it's way more powerful than LL(*), but LL is where its "groove" is)Blotchy
+1 It seems like someone would have written a nice LL grammar text file parser to go on top of parsec.Wiry
Apologies for being a bit slow to upvote or accept any of the replies so far. Lexing, Parsing, Grammars and Parser Generators are new to me, so it's taking me a while to understand everything that's being said here.Adorne
No worries. The TL;DR version of my response below is that we don't bother to build LL(*) parsers with external tools, because we an external tool for that style of grammar doesn't provide us with any extra power over directly implementing that grammar using combinators in the language.Exsert
luqui, can you give a reference for your claim in parens? It seems as if you claimed parsec could solve the halting problem.Chukchi
I can write a function that sums a list of any length. That doesn't mean it will halt on or reject an infinite input :-)Apartment
E
20

The major reason for this is that most LL(k) parsers that are written in functional languages are just implemented using parser combinators, because the easiest path to generate a parser combinator library is recursive descent.

Haskell's parsec, attoparsec, and polyparse and Scala's stock parser combinators all produce what are effectively LL(*) parsers.

Both parsec and attoparsec require you to use an explicit try combinator to get backtracking, but this is only done for efficiency and the scala parser combinators can also deal with packrat parsing.

Consider the following fragment from the announcement of Brent Yorgey's recent unbound package:

parseAtom = parens parseTerm
    <|> var <$> ident
    <|> lam <$> brackets ident <*> parseTerm

it is pretty easy to see the original grammar.

LR parsers require much more complicated preprocessing to generate the tables to execute efficiently, since the direct hand encoding of one using something like recursive ascent is pretty awful.

By implementing your parser combinators as an EDSL rather than an external tool you enable greater use of advanced features of your programming language. You can make portions of the grammar higher order, build the lexer hack directly into the parser, etc. Typical LR parser generators can't do these things, or can only offer them in ad hoc ways in limited contexts because of the need to be able to emit the tables in the end.

Exsert answered 1/4, 2011 at 14:43 Comment(5)
Polyparse should be mentioned as well, since unlike parsec it gives you backtracking by default, and allows you to use commit to cut off choice. So that provides LL(*) "out of the box".Apartment
Excellent answer! It's worth noting that the only reason scannerless parsing works at all with parser combinators is the ordered choice operator, which provides for disambiguation (albeit a very blunt form). It's not entirely clear how to get this same effect with bottom up parsing. Going to the far end of the spectrum though, a generalized algorithm (like GLL: github.com/djspiewak/gll-combinators) works very nicely for scannerless parsing and doesn't impose the weird LL(*) greedy backtracking restrictions.Wynellwynn
Hi Daniel. GLL does work rather well -- though to be pedantic your GLL combinators aren't quite GLL since they can't compute proper follow sets ;) (since there is the possibility of using the monad or infinite grammars, and so you can't find all the occurrences of a terminal). The approximation is good enough for practical purposes. This is the issue that the Swierstra and Duponcheel style of Arrow/Applicative parsers has. I too have a set of GLL-ish combinators that I need to get around to releasing, once I figure out how to polish them up and integrate them coherently into trifecta.Exsert
Actually, GLL doesn't require follow sets at all, that's just a constant-time optimization. It's perfectly acceptable to just "try all possibilities"; the result will still be O(n^3). This is, in fact, what my GLL combinators fall back on in the case where they can't compute a definitive follow set. In general though, follow sets are only necessary if you have an nullable production, which is a surprisingly rare case.Wynellwynn
One advantage that a grammar driven parser generator for LL could give you, is automatic error recovery and repair. That is a reason why a generator tool may be desirable over just writing a recursive descent parser by hand using a combinator library. For example 'ell': cocolab.com/products/cocktail/doc.pdf/ell.pdfWadi
M
5

You've inspired me to post an old hobby project to https://github.com/nrnrnr/ebnf. It supports LL(1) parser generation for Standard ML. Adapting to Haskell would not be hard, provided you can find someone who understands Icon.

Edward's comment that people tend to prefer combinators for parsing is quite correct, but there are advantages to a tool that will find FIRST and FOLLOW sets and will complain about ambiguity.

The primary advantage of EBNF is its notation: sequences, Maybe types, and reserved words are all supported natively, with no extra fuss.

Mishear answered 25/6, 2014 at 18:1 Comment(0)
U
3

SML has had ml-antlr for a couple of years now:

http://www.classes.cs.uchicago.edu/archive/2007/winter/22610-1/docs/lpt-manual.pdf

There is also sllgen for Scheme.

As to why there are many more LR parser generators than LL ones - it's difficult to write LR parsers by hand, so you really need a generator. With LL parsers, a hand-coded implementation still matches the grammar, so there is much less need for a generator.

Upside answered 1/4, 2011 at 7:49 Comment(0)
V
2

With Scala you could use all the existing Java tools without much overhead. JavaCC is an LL(k) parser generator. You can use it to create a concrete syntax tree automatically and do everything else in Scala. I actually did that for a small project, simply because the JavaCC grammar already existed.

Vasily answered 1/4, 2011 at 6:6 Comment(0)
K
1

Before I describe a LL(k) solution under development, I will explain why I didn't use the other available options that I am aware of.

  1. Parser combinator libraries, i.e. recursive descent parsers, do not:

    • guarantee a context-free grammar (CFG), because they do not calculate the first-sets and follow-sets.
    • do efficient table-driven k tokens of look-ahead. Instead they do inefficient back-tracing.
    • do not do the more efficient stack based parsing algorithm.

    A lack of context-freedom is manifested as ambiguities in the syntax, e.g. whether an operator at the start of a source code line (following a line that did not send with a semicolon) is a prefix unary on the expression to its right side, or an infix binary operator on the expression at the end of the prior source code line.

  2. JavaCC has the following disadvantages:

    • conflates the lexer and the parser generation.
    • overly verbose BNF grammar syntax.
    • outputs Java and I want Scala (eventually Copute).
    • afaics doesn't do a tight coupling of names in grammar and in the AST.
    • afaics monolithic source code, e.g. can't (easily) extract modules to generate first and follow-sets (so that I could plugin my own parser generation).

I am attempting to create a LL(k) parser generator that would output to Scala, and then hopefully bootstrap to output the language I am developing (Copute, which will compile to Scala).

My current attempt is using a subset of the SLK grammar syntax, so the SLK tool can be used to verify the grammar is context-free.

Kegan answered 2/12, 2011 at 4:32 Comment(0)
C
-2

You can use ANTLR which generates LL* parsers in Java (amongst others), hence .class resp .jar files.

Chukchi answered 4/4, 2011 at 19:39 Comment(2)
The poster was clearly aware of ANTLR, since it was cited in the question. And furthermore, Java is not a functional language.Apartment
Sorry, did only skim the question. OP tags Scala (which is not functional in any remotely pure sense), ANTLR gives a parser usable in Scala. Your point stands, though.Chukchi

© 2022 - 2024 — McMap. All rights reserved.