Sentence similarity models not capturing opposite sentences

Asked 29/9, 2021 at 10:3 Answered 11/3 at 13:7

python nlp spacy huggingface-transformers sentence-similarity

I have tried different approaches to sentence similarity, namely:

spaCy models: en_core_web_md and en_core_web_lg.
Transformers: using the packages sentence-similarity and sentence-transformers, I've tried models such as distilbert-base-uncased, bert-base-uncased or sentence-transformers/all-mpnet-base-v2.
Universal Sentence Encoding: using the package spacy-universal-sentence-encoder, with the models en_use_md and en_use_cmlm_lg.

However, while these models generally correctly detect similarity for equivalent sentences, they all fail when inputting negated sentences. E.g., these opposite sentences:

"I like rainy days because they make me feel relaxed."
"I don't like rainy days because they don't make me feel relaxed."

return a similarity of 0.931 with the model en_use_md.

However, sentences that could be considered very similar:

"I like rainy days because they make me feel relaxed."
"I enjoy rainy days because they make me feel calm."

return a smaller similarity: 0.914.

My question is: Is there any way around this? Are there any other models/approaches that take into account the affirmative/negative nature of sentences when calculating similarity?

Pericarditis answered 29/9, 2021 at 10:3 Comment(1)

Regarding the transformer: distilbert-base-uncased, bert-base-uncased are not trained to detect similarity. Also, sentences with an opposite meaning can still be similar. Maybe you can try a paraphrasing model or look for a dataset that you can use to finetune a transformer regarding the meaning of a sentence. – Quarry 30/9, 2021 at 5:53

Your question is pertinent and I believe this thought has been across everybody's mind at some point.

If you want to evaluate the logical connection between two sentences, using cosine similarity or euclidean distance on top of some pre-determined embeddings will not suffice.

The actual logical connection between two sentences can be determined via an RTE task (recognizing textual entailment).

The Multi-Genre Natural Language Inference (MultiNLI) : https://cims.nyu.edu/~sbowman/multinli/, is a dataset built specifically on this task of TE (textual entailment, in the context of natural language inference). In essence there are 3 labels (contradiction, neutral and entailment).

At the other end of Pennsylvania Avenue, people began to line up for a White House tour.

People formed a line at the end of Pennsylvania Avenue.

In this case, there is an entailment between the two sentences.

HuggingFace also has some pre-built models for MNLI. You can check for models such as distilbert-base-uncased-mnli, roberta-large-mnli, which are specifically fine-tuned for this task and consider those aforementioned as starting points in your task.

Scheer answered 30/9, 2021 at 8:25 Comment(0)

Handling negation is one of the hard problems in NLP.

A lot of similarity methods will work by averaging the vectors of words in a sentence, in which case one sentence is the other plus the vector for the word "not", which is not going to be very different. Opposites are also usually discussed together frequently, so they're "similar" in that sense, which is the way the word "similar" is usually used in NLP.

There are ways to work around this, often employed in sentiment analysis, but they usually don't "just work". If you can narrow down what kinds of negation you expect to see you might have more success. negspaCy is an unofficial spaCy component that can help detect negation of named entities, which is often useful in medical text ("does not have cancer"), for example. But you have to figure out what to do with that information, and it doesn't help with similarity scores.

You might have some luck using models trained to classify entailment - which classify whether some statement implies, contradicts, or has no bearing on another statement.

Justicz answered 29/9, 2021 at 10:52 Comment(1)

Thank you for your response. As you say, simply detecting negation could still not work since e.g., "I don't like tennis" and "I dislike tennis" would mean essentially the same, but the first one is negated and the second one is not. However, I'll take a look at entailment, it seems it could be a plausible approach :) – Pericarditis 29/9, 2021 at 11:19

Follow-up on my question:

We recently published the paper This is not correct! Negation-aware Evaluation of Language Generation Systems, which addresses this problem.

The following artifacts were released as a result from our work:

The CANNOT (Compilation of ANnotated, Negation-Oriented Text-pairs) dataset, which focuses on negated textual pairs. It is available on GitHub and Hugging Face. Useful to finetune models in order to improve their sensitivity towards negations.
A rule-based sentence negator for Python: Negate. Useful to generate negation training data.
Finetuned, negation-aware Sentence Transformer models. These models report much lower scores for negated sentence pairs when compared to their base models (see example below):
- dmlls/all-mpnet-base-v2-negation
- tum-nlp/NegMPNet
A negation-aware evaluation metric:
- tum-nlp/NegBLEURT

Coming back to the examples in the question, the model dmlls/all-mpnet-base-v2-negation reports the following scores:

I like rainy days because they make me feel relaxed.
I don't like rainy days because they don't make me feel relaxed.

Cosine similarity: 0.386

I like rainy days because they make me feel relaxed.
I enjoy rainy days because they make me feel calm.

Cosine similarity: 0.948

While admittedly this work does not completely solve the negation problem in modern NLP models, we believe it is a step forward in the right direction, and hopefully useful for the NLP community!

Pericarditis answered 8/3 at 14:6 Comment(1)

How cool it is that 2 years later the OP follows up with original work to answer his own question. That’s the sign of a great researcher! I’ll give this a shot now. – Ervinervine 26/5 at 16:40

-1

I used the model dmlls/all-mpnet-base-v2-negation and compared 2 sentences : I like rainy days because they make me feel relaxed. I don't like rainy days because they don't make me feel relaxed. and I am getting a Cosine similarity of 0.74 which is quite high. How did you get the score 0.38 ? Sharing the complete code below:

config = AutoConfig.from_pretrained('dmlls/all-mpnet-base-v2-negation')
model =  AutoModel.from_config(config)
tokenizer = AutoTokenizer.from_pretrained('dmlls/all-mpnet-base-v2-negation')

a = 'I like rainy days because they make me feel relaxed.'
b = 'I don''t like rainy days because they don''t make me feel relaxed.'

inputs_a = tokenizer(a, return_tensors='pt', padding=True, truncation=True)
inputs_b = tokenizer(b, return_tensors='pt', padding=True, truncation=True)

with torch.no_grad():
    outputs_a = model(**inputs_a)
    outputs_b = model(**inputs_b)

embeddings_a = outputs_a.last_hidden_state[:, 0, :]
embeddings_b = outputs_b.last_hidden_state[:, 0, :]

similarity_prob = cosine_similarity(embeddings_a, embeddings_b)
print(similarity_prob)

Carlton answered 11/3 at 13:7 Comment(2)

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center. – Lenhard 11/3 at 22:33

This model is a Sentence Transformer, meant to be used with the sentence-transformers module. You can find an example at sbert.net/docs/usage/semantic_textual_similarity.html. – Pericarditis 12/3 at 12:36

Recommended topics

Hot tags