Can anybody think of a way to speed up my CoreNLP Sentiment Analysis (below)?
I initialize the CoreNLP pipeline once on server startup:
// Initialize the CoreNLP text processing pipeline
public static Properties props = new Properties();
public static StanfordCoreNLP pipeline;
// Set text processing pipeline's annotators
props.setProperty("annotators", "tokenize, ssplit, pos, parse, sentiment");
// Use Shift-Reduce Constituency Parsing (O(n),
// http://nlp.stanford.edu/software/srparser.shtml) vs CoreNLP's default
// Probabilistic Context-Free Grammar Parsing (O(n^3))
props.setProperty("parse.model", "edu/stanford/nlp/models/srparser/englishSR.ser.gz");
pipeline = new StanfordCoreNLP(props);
Then I call the pipeline from my Controller:
String text = 'A sample string.'
Annotation annotation = pipeline.process(text);
List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
for (CoreMap sentence : sentences) {
Tree tree = sentence.get(SentimentCoreAnnotations.SentimentAnnotatedTree.class);
int sentiment = RNNCoreAnnotations.getPredictedClass(tree);
...
}
I've profiled the code -- the line Annotation annotation = pipeline.process(text)
, which is CoreNLP's main processing call, is very slow. A request with 100 calls to my controller takes an average of 1.07 seconds. The annotation is taking ~7ms per call. I need to reduce that to ~2ms.
I can't remove any of the annotators because sentiment relies on all of them. I'm already using the Shift-Reduce Constituency Parser because it is much faster than the default Context-Free Grammar Parser.
Are there any other parameters I can tune to significantly speed this up?