I'm working on a small Haskell project that needs to be able to lex a very small subset of strictly formed English in to tokens for semantic parsing. It's a very naïve natural language interface to a system with many different end effectors than can be issued commands. I'm currently using Alex for this, but Alex relies on its lexicon to be statically compiled. The nature of the system is such that the number and even type of end effectors in the world can increase as well as decrease after compilation, and so I need to be able to add or remove viable tokens from the lexicon at runtime.
I've tried looking around for dynamic lexing solutions, and the closest I could get was this Dynamic Lexer Engine that doesn't look to have been updated since 2000.
I've been considering some techniques like using a less-high level approach (Attoparsec, perhaps), or even wiring up a recompilation hook for Alex and separating the lexer from the rest of the application.
Are there any well-known solutions for this sort of lexical analysis? I intend on working through Natural Language Processing for the Working Programmer eventually so I can take a less simplified approach, but currently a basically lexer is what I need.
Data.Map
but maybe a hashmap, or a stringtrie?) If your universe of inputs is finite, I think you'd be surprised by the efficiency of this approach. – Antipole