Generalized Bottom up Parser Combinators in Haskell

Asked 5/6, 2014 at 2:57 Answered 23/7, 2018 at 18:17

parsing haskell parsec parser-combinators attoparsec

I am wondered why there is no generalized parser combinators for Bottom-up parsing in Haskell like a Parsec combinators for top down parsing.
( I could find some research work went during 2004 but nothing after
https://haskell-functional-parsing.googlecode.com/files/Ljunglof-2002a.pdf http://www.di.ubi.pt/~jpf/Site/Publications_files/technicalReport.pdf )

Is there any specific reason for not achieving it?

Cell answered 5/6, 2014 at 2:57 Comment(0)

This is because of referential transparency. Just as no function can tell the difference between

let x = 1:x
let x = 1:1:1:x
let x = 1:1:1:1:1:1:1:1:1:...  -- if this were writeable

no function can tell the difference between a grammar which is a finite graph and a grammar which is an infinite tree. Bottom-up parsing algorithms need to be able to see the grammar as a graph, in order to enumerate all the possible parsing states.

The fact that top-down parsers see their input as infinite trees allows them to be more powerful, since the tree could be computationally more complex than any graph could be; for example,

numSequence n = string (show n) *> option () (numSequence (n+1))

accepts any finite ascending sequence of numbers starting at n. This has infinitely many different parsing states. (It might be possible to represent this in a context-free way, but it would be tricky and require more understanding of the code than a parsing library is capable of, I think)

A bottom up combinator library could be written, though it is a bit ugly, by requiring all parsers to be "labelled" in such a way that

the same label always refers to the same parser, and
there is only a finite set of labels

at which point it begins to look a lot more like a traditional specification of a grammar than a combinatory specification. However, it could still be nice; you would only have to label recursive productions, which would rule out any infinitely-large rules such as numSequence.

Embosom answered 5/6, 2014 at 3:17 Comment(1)

There are other approaches than manually labeling, by the way. C.f. data-reify which transforms a recursive DSL into a graph representation for you. See also this paper. – Humanity 22/2, 2021 at 18:2

As luqui's answer indicates a bottom-up parser combinator library is not a realistic. On the chance that someone gets to this page just looking for haskell's bottom up parser generator, what you are looking for is called the Happy parser generator. It is like yacc for haskell.

Mittel answered 5/6, 2014 at 5:1 Comment(0)

As luqui said above: Haskell's treatment of recursive parser definitions does not permit the definition of bottom-up parsing libraries. Bottom-up parsing libraries are possible though if you represent recursive grammars differently. With apologies for the self-promotion, one (research) parser library that uses such an approach is grammar-combinators. It implements a grammar transformation called the uniform Paull transformation that can be combined with the top-down parser algorithm to obtain a bottom-up parser for the original grammar.

Hbeam answered 5/6, 2014 at 20:9 Comment(0)

@luqui essentially says, that there are cases in which sharing is unobservable. However, it's not the case in general: many approaches to observable sharing exist. E.g. http://www.ittc.ku.edu/~andygill/papers/reifyGraph.pdf mentions a few different methods to achieve observable sharing and proposes its own new method:

This looping structure can be used for interpretation, but not for further analysis, pretty printing, or general processing. The challenge here, and the subject of this paper, is how to allow trees extracted from Haskell hosted deep DSLs to have observable back-edges, or more generally, observable sharing. This a well-understood problem, with a number of standard solutions.

Note that the "ugly" solution of @liqui is mentioned by the paper under the name of explicit labels. The solution proposed by the paper is still "ugly" as it uses so called "stable names", but other solutions such as http://www.cs.utexas.edu/~wcook/Drafts/2012/graphs.pdf (which relies on PHOAS) may work.

Executor answered 23/7, 2018 at 18:17 Comment(0)

Recommended topics

Hot tags