What I'm looking for is not just a plain similarity score between two texts. But a similarity score of a substring inside a string. Say:
text1 = 'cat is sleeping on the mat'.
text2 = 'The cat is sleeping on the red mat in the living room'.
In the above example, all the words of text1
are present in the text2
completely, hence the similarity should be 100%.
If some words of text1
are missing, the score shall be less.
I'm working with a large dataset of varying paragraph size, hence finding a smaller paragraph inside a bigger one with such similarity score is crucial.
I found only string similarities such as cosine similarities, difflib similarity etc. which compares two strings. But not about a score of substring inside another string.