How to remove stopwords using stanford nlp
Asked Answered
T

2

9

I want to parse the document using stanford nlp and remove stopwords from that, so my question is how to remove stopwords using stanford nlp is there any api to remove that, I find StopWords class but I dont know how to use this, please suggest me how to get this?

Thanks

Tina answered 25/7, 2013 at 3:56 Comment(3)
Please show some effort. What have you tried to do with the StopWords class that isn't working?Seafood
i have just parsed some text using stanford nlp, i dint tried any stopwords class, i am just asking for your suggestions how to remove stop wordsTina
The first step in addressing this question for anyone performing this function is what stopwords list do you wish to use? The answer likely varies for the task/corpus you are working withAnglocatholic
P
6

I think you can use this annotator to remove stop words https://github.com/jconwell/coreNlp

Pearse answered 14/8, 2013 at 18:47 Comment(3)
+1 No think about it, this is the recommended approach on Stanford's CoreNLP pageAnglocatholic
Does NOT explain Stanford NLPLast
@Anglocatholic That link just directs to their About page, which doesn't mention stop words. Do you have a link to a specific article they have on removing stopwords with CoreNLP? I've searched all the questions their help account has answered looking for stopword questions, they are few.Recusant
H
1

If I'm correct the annotator mentioned by @Raju Penumatsa above is accessible on Maven here: https://mvnrepository.com/artifact/com.zensols/stopword-annotator And maintained in another git repo here: https://github.com/plandes/stopword-annotator

With the usage of the Maven repository you can easily use the annotator in your project as a dependency by importing it with a build tool such as Maven or Gradle etc. and you don't have to copy the lib into your classpath manually, so it is easier and more maintainable. The Git repo I linked moved the stopword plugin of the jconwell/coreNlp project into a separate repo and added some additional metadata in order to be able to publish it on Maven Central.

Hopeh answered 2/7, 2020 at 19:12 Comment(2)
links as answer is acceptable, but please consider to put some information what it does, why this solves the problem. Please consider to check How to AnswerDirective
I thought it is somehow self explanatory. With the usage of the Maven repository you can easily use the annotator in your project as a dependency by importing it with a build tool such as Maven or Gradle etc. and you don't have to copy the lib into your classpath manually, so it is easier and more maintainable. The Git repo I linked moved the stopword plugin of the jconwell/coreNlp project into a separate repo and added some additional metadata in order to be able to publish it on Maven Central. :)Hopeh

© 2022 - 2024 — McMap. All rights reserved.