I'm using some UIMA annotators in a pipeline. It run tasks like:
- tokenizer
- sentence splitter
- gazetizer
- My Annotator
The problem is that I don't want to write ALL the annotations (Token, Sentence, SubToken, Time, myAnnotations, etc..) to the disk because the files gets very large quicky.
I want to remove all the annotations and keep only the created by My Annotator.
I'm working with the next libraries:
- uimaFIT 2.0.0
- ClearTK 1.4.1
- Maven
And I'm using a org.apache.uima.fit.pipeline.SimplePipeline
with:
SimplePipeline.runPipeline(
UriCollectionReader.getCollectionReaderFromDirectory(filesDirectory), //directory with text files
UriToDocumentTextAnnotator.getDescription(),
StanfordCoreNLPAnnotator.getDescription(),//stanford tokenize, ssplit, pos, lemma, ner, parse, dcoref
AnalysisEngineFactory.createEngineDescription(//
XWriter.class,
XWriter.PARAM_OUTPUT_DIRECTORY_NAME, outputDirectory,
XWriter.PARAM_FILE_NAMER_CLASS_NAME, ViewURIFileNamer.class.getName())
);
What I'm trying to do is to use the Standford NLP annotator(from ClearTK) and remove the useless annotation.
How do I do this?
From what I know, you can use the removeFromIndexes();
method from with an Annotation instance.
Do I need to create an UIMA processor and add it to my pipeline?