I have tried different approaches to sentence similarity, namely:
spaCy models:
en_core_web_md
anden_core_web_lg
.Transformers: using the packages
sentence-similarity
andsentence-transformers
, I've tried models such asdistilbert-base-uncased
,bert-base-uncased
orsentence-transformers/all-mpnet-base-v2
.Universal Sentence Encoding: using the package
spacy-universal-sentence-encoder
, with the modelsen_use_md
anden_use_cmlm_lg
.
However, while these models generally correctly detect similarity for equivalent sentences, they all fail when inputting negated sentences. E.g., these opposite sentences:
- "I like rainy days because they make me feel relaxed."
- "I don't like rainy days because they don't make me feel relaxed."
return a similarity of 0.931 with the model en_use_md
.
However, sentences that could be considered very similar:
- "I like rainy days because they make me feel relaxed."
- "I enjoy rainy days because they make me feel calm."
return a smaller similarity: 0.914.
My question is: Is there any way around this? Are there any other models/approaches that take into account the affirmative/negative nature of sentences when calculating similarity?