I want to break next string
into sentences:
library(NLP) # NLP_0.1-7
string <- as.String("Mr. Brown comes. He says hello. i give him coffee.")
I want to demonstrate two different ways. One comes from package openNLP
:
library(openNLP) # openNLP_0.2-5
sentence_token_annotator <- Maxent_Sent_Token_Annotator(language = "en")
boundaries_sentences<-annotate(string, sentence_token_annotator)
string[boundaries_sentences]
[1] "Mr. Brown comes." "He says hello." "i give him coffee."
And second comes from package stringi
:
library(stringi) # stringi_0.5-5
stri_split_boundaries( string , opts_brkiter=stri_opts_brkiter('sentence'))
[[1]]
[1] "Mr. " "Brown comes. "
[3] "He says hello. i give him coffee."
After this second way I need to prepare sentences to remove extra spaces or break a new string into sentences again. Can I adjust stringi function to improve result's quality?
When it is about a big data, openNLP
is (very much) slower then stringi
.
Is there a way to combine stringi
(->fast) and openNLP
(->quality)?