Open-source rule-based pattern matching / information extraction frameworks? [closed]

Asked 26/7, 2013 at 22:20 Answered 4/2, 2014 at 10:59

text open-source nlp named information-extraction

I'm shopping for an open-source framework for writing natural language grammar rules for pattern matching over annotations. You could think of it like regexps but matching at the token rather than character level. Such a framework should enable the match criteria to reference other attributes attached to the input tokens or spans, as well as modify such attributes in an action.

There are three options I know of which fit this description:

Are there any other options like these available at this time?

Related Tools

While I know that general parser generators like Antlr can also serve this purpose, I'm looking for something which are more specifically tailored for natural language processing or information extraction.
UIMA includes a Regex Annotator plugin for declaring rules in XML, but appears to operate at the character rather than high-level objects.
I know that this kind of task is often performed with statistical models, but for narrow, structured domains there's benefit in hand-crafting rules.

* With GExp 'rules' are actually implemented in code but since there are so few options I chose to include it.

Serial answered 26/7, 2013 at 22:20 Comment(4)

TextMarker seems to be the JAPE equivalent for UIMA. But I haven't used it myself. – Advocacy 28/7, 2013 at 16:34

Thank you, that's a good addition to the list. – Serial 23/8, 2013 at 16:17

Ruta (formerly TextMarker) has a nice tutorial, give it a try – Uzial 22/11, 2013 at 20:41

Gate General Arch for Text Engineering - a full-lifecycle open source solution for text processing – Isomerous 30/9, 2018 at 17:3

You may also check HTQL. It supports regular expression search of tokens. An example to search for state and zip from US address is:

a=htql.RegEx(); 
a.setNameSet('states', states);
a.reSearchList(address.split(), r"&[ws:states]<,>?<\d{5}>", case=False)

Sogdian answered 25/8, 2013 at 0:26 Comment(0)

French academic soft Unitex from University Paris East also matches your description (http://www-igm.univ-mlv.fr/~unitex/)

It's C++ based, comprises many optional preprocessing rules and lexicons for 20+ languages.

The GUI is graph based (you design automata ie 'grammars').

Infighting answered 4/2, 2014 at 10:59 Comment(0)

Recommended topics

Hot tags