As a simplified subproblem of a parser for a real language, I am trying to implement a parser for expressions of a fictional language which looks similar to standard imperative languages (like Python, JavaScript, and so). Its syntax features the following construct:
- integer numbers
- identifiers (
[a-zA-Z]+
) - arithmetic expressions with
+
and*
and parenthesis - structure access with
.
(egfoo.bar.buz
) - tuples (eg
(1, foo, bar.buz)
) (to remove ambiguity one-tuples are written as(x,)
) - function application (eg
foo(1, bar, buz())
) - functions are first class so they can also be returned from other functions and directly be applied (eg
foo()()
is legal becausefoo()
might return a function)
So a fairly complex program in this language is
(1+2*3, f(4,5,6)(bar) + qux.quux()().quuux)
the associativity is supposed to be
( (1+(2*3)), ( ((f(4,5,6))(bar)) + ((((qux.quux)())()).quuux) ) )
I'm currently using the very nice uu-parsinglib
an applicative parser combinator library.
The first problem was obviously that the intuitive expression grammar (expr -> identifier | number | expr * expr | expr + expr | (expr)
is left-recursive. But I could solve that problem using the the pChainl
combinator (see parseExpr
in the example below).
The remaining problem (hence this question) is function application with functions returned from other functions (f()()
). Again, the grammar is left recursive expr -> fun-call | ...; fun-call -> expr ( parameter-list )
. Any ideas how I can solve this problem elegantly using uu-parsinglib
? (the problem should directly apply to parsec
, attoparsec
and other parser combinators as well I guess).
See below my current version of the program. It works well but function application is only working on identifiers to remove the left-recursion:
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE RankNTypes #-}
module TestExprGrammar
(
) where
import Data.Foldable (asum)
import Data.List (intercalate)
import Text.ParserCombinators.UU
import Text.ParserCombinators.UU.Utils
import Text.ParserCombinators.UU.BasicInstances
data Node =
NumberLiteral Integer
| Identifier String
| Tuple [Node]
| MemberAccess Node Node
| FunctionCall Node [Node]
| BinaryOperation String Node Node
parseFunctionCall :: Parser Node
parseFunctionCall =
FunctionCall <$>
parseIdentifier {- `parseExpr' would be correct but left-recursive -}
<*> parseParenthesisedNodeList 0
operators :: [[(Char, Node -> Node -> Node)]]
operators = [ [('+', BinaryOperation "+")]
, [('*' , BinaryOperation "*")]
, [('.', MemberAccess)]
]
samePrio :: [(Char, Node -> Node -> Node)] -> Parser (Node -> Node -> Node)
samePrio ops = asum [op <$ pSym c <* pSpaces | (c, op) <- ops]
parseExpr :: Parser Node
parseExpr =
foldr pChainl
(parseIdentifier
<|> parseNumber
<|> parseTuple
<|> parseFunctionCall
<|> pParens parseExpr
)
(map samePrio operators)
parseNodeList :: Int -> Parser [Node]
parseNodeList n =
case n of
_ | n < 0 -> parseNodeList 0
0 -> pListSep (pSymbol ",") parseExpr
n -> (:) <$>
parseExpr
<* pSymbol ","
<*> parseNodeList (n-1)
parseParenthesisedNodeList :: Int -> Parser [Node]
parseParenthesisedNodeList n = pParens (parseNodeList n)
parseIdentifier :: Parser Node
parseIdentifier = Identifier <$> pSome pLetter <* pSpaces
parseNumber :: Parser Node
parseNumber = NumberLiteral <$> pNatural
parseTuple :: Parser Node
parseTuple =
Tuple <$> parseParenthesisedNodeList 1
<|> Tuple [] <$ pSymbol "()"
instance Show Node where
show n =
let showNodeList ns = intercalate ", " (map show ns)
showParenthesisedNodeList ns = "(" ++ showNodeList ns ++ ")"
in case n of
Identifier i -> i
Tuple ns -> showParenthesisedNodeList ns
NumberLiteral n -> show n
FunctionCall f args -> show f ++ showParenthesisedNodeList args
MemberAccess f g -> show f ++ "." ++ show g
BinaryOperation op l r -> "(" ++ show l ++ op ++ show r ++ ")"