Using Alex in Haskell to make a lexer that parses Dice Rolls

Asked 13/7, 2020 at 4:48 Answered 14/7, 2020 at 1:34

I'm making a parser for a DSL in Haskell using Alex + Happy. My DSL uses dice rolls as part of the possible expressions.

Sometimes I have an expression that I want to parse that looks like:

[some code...]  3D6  [... rest of the code]

Which should translate roughly to:

TokenInt {... value = 3}, TokenD, TokenInt {... value = 6}

My DSL also uses variables (basically, Strings), so I have a special token that handle variable names. So, with this tokens:

"D"                                 { \pos str -> TokenD pos }
$alpha [$alpha $digit \_ \']*       { \pos str -> TokenName pos str}
$digit+                             { \pos str -> TokenInt pos (read str) }

The result I'm getting when using my parse now is:

TokenInt {... value = 3}, TokenName { ... , name = "D6"}

Which means that my lexer "reads" an Integer and a Variable named "D6".

I have tried many things, for example, i changed the token D to:

$digit "D" $digit                   { \pos str -> TokenD pos }

But that just consumes the digits :(

Can I parse the dice roll with the numbers?
Or at least parse TokenInt-TokenD-TokenInt?

PS: I'm using PosN as a wrapper, not sure if relevant.

Paquin answered 13/7, 2020 at 4:48 Comment(2)

Good question, though IMHO probably the better way to do this in this case would be to just roll the lexer by hand. For such small languages, tools can make things harder than just doing things manually. This can easily come up in a larger language as well though, so it's a good question to ask regardless. – Cornelison 13/7, 2020 at 10:20

You could actually keep lexing these separately, and parse Dn as a postfix operator that iterates an n-sided die roll by its operand; or even parse D as an infix operator with a die count on the left and a die value on the right. Then the roll is not limited to fixed dice, and you could allow dynamic expressions such as x D y (or styled as e.g. x D(y)) ≅ replicateM x (rollD y) – Gametophyte 13/7, 2020 at 18:54

The way I'd go about it would be to extend the TokenD type to TokenD Int Int so using the basic wrapper for convenience I would do

$digit+ D $digit+ { dice }
...
dice :: String -> Token
dice s = TokenD (read $ head ls) (read $ last ls)
  where ls = split 'D' s

split can be found here.

This is an extra step that'd usually be done in during syntactic analysis but doesn't hurt much here.

Also I can't make Alex parse $alpha for TokenD instead of TokenName. If we had Di instead of D that'd be no problem. From Alex's docs:

When the input stream matches more than one rule, the rule which matches the longest prefix of the input stream wins. If there are still several rules which match an equal number of characters, then the rule which appears earliest in the file wins.

But then your code should work. I don't know if this is an issue with Alex.

Mailbox answered 13/7, 2020 at 21:11 Comment(2)

I want to keep the PosN wrapper, since I can use it to mark the position of the parse error. However I like the idea, I think I can use that. Thanks! – Paquin 14/7, 2020 at 1:26

Yeah you can still use posn. The idea stays the same with any wrapper. – Mailbox 14/7, 2020 at 8:40

I decided that I could survive with variables starting with lowercase letters (like Haskell variables), so I changed my lexer to parse variables only if they start with a lowercase letter. That also solved some possible problems with some other reserved words.

I'm still curious to know if there were other solutions, but the problem in itself was solved.

Thank you all!

Paquin answered 14/7, 2020 at 1:34 Comment(0)

Recommended topics

Hot tags