How apache UIMA is different from Apache Opennlp

As I understand the question, you are asking for the differences between the feature sets of Apache UIMA and Apache OpenNLP. Their feature sets barely have anything in common as these two projects have very different aims.

Apache UIMA is an open source implementation of the UIMA specification. The latter defines a conceptual framework for augmenting unstructured information (such as natural language produced by humans) with structured metadata so that computers can work with it.

As an example for an application working with unstructured information, let us take a an application that takes natural language text as input and marks all named entities in the given text, e.g.

Input text = "Bob's cat Charlie is chasing a mouse."
Result = "<NE>Bob</NE>'s cat <NE>Charlie</NE> is chasig a mouse."

To identify the named entities in this example (i.e. Bob and Charlie), several steps of natural language processing have to be performed. Without going into detail about what each of the steps does, a hypothetical system for named entity recognition might involve the following steps:

Data preparation
Sentence splitting
Tokenization
Token lemmatization
Part-of-speech tagging
Phrase detection
Classifying phrases as named entities or not

As you can see, such applications can be very intuitively modelled as sequences of components, and this is exactly what UIMA does. It models applications dealing with unstructed information as pipelines of components (called analytics in UIMA parlance). As you can imagine, many of the pipeline components listed above can be used for other tasks and so the architecture design of UIMA emphasizes reusability of components.

To avoid confusion, the UIMA standard itself doesn't provide any specific components, but defines an infrastructure for UIM (Unstructured Information Management) applications, e.g. workflows, data types, inter-component communication, and so on.

Apache OpenNLP on the other hand does exactly that, namely provide concrete implementations of NLP algorithms dealing with very specific tasks (sentence splitting, POS-tagging, etc.). The source of your confusion might be that it is possible to write Apache UIMA components that wrap OpenNLP tools. The OpenNLP project actually provides such components.

Whether you want to use the UIMA framework for your UIM applications depends on the size of the project. If it is small, I would go without UIMA and just use OpenNLP directly, as UIMA is rather heavy-weight and thus only adds complex yet (for small applications) unnecessary overhead. Also, due to its complexity, it takes a good amount of time to learn how to use it.

Summing up, Apache UIMA and Apache OpenNLP solve different problems, but since both deal with unstructured information, they can be combined profitably.

Recommended topics

Hot tags