I have text column in df1 and text column in df2. The length of df2 will be different to that of length of df1. I want to calculare cosine similarity for every entry in df1[text] against every entry in df2[text] and give a score for every match.
sample input
df1
mahesh
suresh
df2
surendra
mahesh
shrivatsa
suresh
maheshwari
sample output
mahesh surendra 30
mahesh mahesh 100
mahesh shrivatsa 20
mahesh suresh 60
mahesh maheshwari 80
suresh surendra 70
suresh mahesh 60
suresh shrivatsa 40
suresh suresh 100
suresh maheshwari 30
i was facing issues( getting key errors) when I was trying to match these two columns for similarity using tf-idf approach as these columns were of different lengths . is there any other way through we can solve this problem... Any help would be greatly appreicated. I have searched a lot and found that in almost all cases people were comparing the first document against rest of documents in the same corpus. here it is like comparing every document of corpus 1 with every document on corpus2 .