Is it necessary to do stopwords removal ,Stemming/Lemmatization for text classification while using Spacy,Bert?
Asked Answered
B

4

12

Is stopwords removal ,Stemming and Lemmatization necessary for text classification while using Spacy,Bert or other advanced NLP models for getting the vector embedding of the text ?

text="The food served in the wedding was very delicious"

1.since Spacy,Bert were trained on huge raw datasets are there any benefits of apply stopwords removal ,Stemming and Lemmatization on these text before generating the embedding using bert/spacy for text classification task ?

2.I can understand stopwords removal ,Stemming and Lemmatization will be good when we use countvectorizer,tfidf vectorizer to get embedding of sentences .

Bilbao answered 28/8, 2020 at 12:10 Comment(1)
You can test to see if doing stemming lemmatization and stopword removal helps. It doesn't always. I usually do if I gonna graph as the stopwords clutter up the results.Soissons
S
15

You can test to see if doing stemming lemmatization and stopword removal helps. It doesn't always. I usually do if I gonna graph as the stopwords clutter up the results.

A case for not using Stopwords Using Stopwords will provide context to the user's intent, so when you use a contextual model like BERT. In such models like BERT, all stopwords are kept to provide enough context information like the negation words (not, nor, never) which are considered to be stopwords.

According to https://arxiv.org/pdf/1904.07531.pdf

"Surprisingly, the stopwords received as much attention as non-stop words, but removing them has no effect inMRR performances. "

Soissons answered 28/8, 2020 at 14:20 Comment(0)
R
4

With BERT you don't process the texts; otherwise, you lose the context (stemming, lemmatization) or change the texts outright (stop words removal).

Some more basic models (rule-based or bag-of-words) would benefit from some processing, but you must be very careful with stop words removal: many words that change the meaning of an entire sentence are stop words (not, no, never, unless).

Repairer answered 28/11, 2020 at 9:5 Comment(0)
H
2
  • Do not remove SW, as they add new information(context-awareness) to the sentence (viz., text summarization, machine/language translation, language modeling, question-answering)

  • Remove SW if we want only general idea of the sentence (viz., sentiment analysis, language/text classification, spam filtering, caption generation, auto-tag generation, topic/document

Holeandcorner answered 27/6, 2022 at 13:56 Comment(0)
S
1

It's not mandatory. Removing stopwords can sometimes help and sometimes not. You should try both.

Slowworm answered 28/8, 2020 at 14:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.