I need to compute the cosine similarity between strings in a list. For example, I have a list of over 10 million strings, each string has to determine similarity between itself and every other string in the list. What is the best algorithm I can use to efficiently and quickly do such task? Is the divide and conquer algorithm applicable?
EDIT
I want to determine which strings are most similar to a given string and be able to have a measure/score associated with the similarity. I think what I want to do falls in line with clustering where the number of clusters are not initially known.