Are there any off-the-shelf solutions for lexical analysis in Haskell that allow for a run-time dynamic lexicon?
Asked Answered
S

1

7

I'm working on a small Haskell project that needs to be able to lex a very small subset of strictly formed English in to tokens for semantic parsing. It's a very naïve natural language interface to a system with many different end effectors than can be issued commands. I'm currently using Alex for this, but Alex relies on its lexicon to be statically compiled. The nature of the system is such that the number and even type of end effectors in the world can increase as well as decrease after compilation, and so I need to be able to add or remove viable tokens from the lexicon at runtime.

I've tried looking around for dynamic lexing solutions, and the closest I could get was this Dynamic Lexer Engine that doesn't look to have been updated since 2000.

I've been considering some techniques like using a less-high level approach (Attoparsec, perhaps), or even wiring up a recompilation hook for Alex and separating the lexer from the rest of the application.

Are there any well-known solutions for this sort of lexical analysis? I intend on working through Natural Language Processing for the Working Programmer eventually so I can take a less simplified approach, but currently a basically lexer is what I need.

Strength answered 7/2, 2013 at 21:1 Comment(7)
How much of an issue is performance? I can imagine some very straightforward solutions, depending on whether you need a great deal of efficiency or not.Antipole
@Antipole Performance needs to be near real-time (in the responsiveness sense, not the determinism sense).Strength
That Dynamic Lexer Engine isn't automatically bad just because it hasn't been updated in a long time. Maybe it's already perfect, and doesn't need to be updated. :)Ingeminate
@RobertHarvey Haha, while that's technically true, I just don't want to hitch my wagon to a horse that may stop working in the future as GHC evolves.Strength
I guess I misunderstood. Your question gives the impression that the lexer is a temporary solution, not a permanent one...Ingeminate
@RobertHarvey it's possible that I may abandon a lexer-parser architecture, but not guaranteed. If it turns out that a robust enough dynamic lexer can be tied in to a smart enough parser, it may be good enough for my needs.Strength
How bad would it be to just use a decent map (i.e. maybe not Data.Map but maybe a hashmap, or a stringtrie?) If your universe of inputs is finite, I think you'd be surprised by the efficiency of this approach.Antipole
C
4

CTK is an equivalent of parsec but for lexing. It supports adding new combinators dynamically.

Clubbable answered 8/2, 2013 at 9:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.