I have a large data set that I would like to cluster. My trial run set size is 2,500 objects; when I run it on the 'real deal' I will need to handle at least 20k objects.
These objects have a cosine similarity between them. This cosine similarity does not satisfy the requirements of being a mathematical distance metric; it doesn't satisfy the triangle inequality.
I would like to cluster them in some "natural" way that puts similar objects together without needing to specify beforehand the number of clusters I expect.
Does anyone know of an algorithm that would do that? Really, I'm just looking for any algorithm that doesn't require a) a distance metric and b) a pre-specified number of clusters.
Many thanks!
This question has been asked before here: Clustering from the cosine similarity values (but this solution only offers K-means clustering), and here: Effective clustering of a similarity matrix (but this solution was rather vague)