I have downloaded en_core_web_lg
model and trying to find similarity between two sentences:
nlp = spacy.load('en_core_web_lg')
search_doc = nlp("This was very strange argument between american and british person")
main_doc = nlp("He was from Japan, but a true English gentleman in my eyes, and another one of the reasons as to why I liked going to school.")
print(main_doc.similarity(search_doc))
Which returns very strange value:
0.9066019751888448
These two sentences should not be 90% similar they have very different meanings.
Why this is happening? Do I need to add some kind of additional vocabulary in order to make similarity result more reasonable?