Is there an haskell EDSL for writing lexers?
Asked Answered
J

2

12

Mixing the lexer and parsing phases in one phase sometimes makes Parsec parsers less readable but also slows them down. One solution is to use Alex as a tokenizer and then Parsec as a parser of the token stream.

This is fine but it would be even better if I could get rid of Alex because it adds one preprocessing phase in the compilation pipeline, doesn't integrate well with haskell "IDEs", etc. I was wondering if there was such a thing as an haskell EDSL for describing tokenizers, very much in the style of Alex, but as a library.

Jeffryjeffy answered 13/10, 2011 at 8:55 Comment(2)
This is a question that I have been looking into as of late but there have been nothing I've really seen. I'm imagining maybe a RegEx EDSL from which we make an untagged tokenizer (:: [RegEx] -> String -> [String]).Cytogenetics
I could come up with a quick solution using any regexp library by trying to match the current string agains each regexp, but I would lose a lot of Alex' optimizations due to its knowledge of the set of all regexps.Jeffryjeffy
C
4

Yes - http://www.cse.unsw.edu.au/~chak/papers/Cha99.html

Before Hackage, Manuel used to release the code in a package called CTK (compiler toolkit). I'm not sure what the status of project is these days.

I think Thomas Hallgren's lexer from the paper "Lexing Haskell in Haskell" was dynamic rather than a code generator, whilst the release is tailored to lexing Haskell the machinery in the library is more general. Iavor Diatchki has put the code on Hackage.

http://hackage.haskell.org/package/haskell-lexer

Capricecapricious answered 13/10, 2011 at 17:13 Comment(0)
E
3

You can use Parsec as the lexer too. First you parse the string into tokens, then you parse the tokens into the target data type.

Excited answered 13/10, 2011 at 10:28 Comment(1)
True but then again you lose the speed of minimal DFAs that you could get with a tool like Alex without losing any expressiveness (I prefer Parsec over, say, Yacc because it offers better modularity/expressiveness, but I'm not convinced this is very useful for lexers). But at least, it solves the problem of mixing the two phases. Thanks.Jeffryjeffy

© 2022 - 2024 — McMap. All rights reserved.