I'm shopping for an open-source framework for writing natural language grammar rules for pattern matching over annotations. You could think of it like regexps but matching at the token rather than character level. Such a framework should enable the match criteria to reference other attributes attached to the input tokens or spans, as well as modify such attributes in an action.
There are three options I know of which fit this description:
- GATE Java Expressions over Annotations (JAPE)
- Stanford CoreNLP's TokensRegex
- UIMA Ruta (Tutorial)
- Graph Expression (GExp)*
Are there any other options like these available at this time?
Related Tools
- While I know that general parser generators like Antlr can also serve this purpose, I'm looking for something which are more specifically tailored for natural language processing or information extraction.
- UIMA includes a Regex Annotator plugin for declaring rules in XML, but appears to operate at the character rather than high-level objects.
- I know that this kind of task is often performed with statistical models, but for narrow, structured domains there's benefit in hand-crafting rules.
* With GExp 'rules' are actually implemented in code but since there are so few options I chose to include it.