Example, I have billions of short phrases, and I want to clusters of them that are similar.
> strings.to.cluster <- c("Best Toyota dealer in bay area. Drive out with a new car today",
"Largest Selection of Furniture. Stock updated everyday" ,
" Unique selection of Handcrafted Jewelry",
"Free Shipping for orders above $60. Offer Expires soon",
"XXXX is where smart men buy anniversary gifts",
"2012 Camrys on Sale. 0% APR for select customers",
"Closing Sale on office desks. All Items must go"
)
assume that this vector is hundreds of thousands of rows. Is there a package in R to cluster these phrases by meaning? or could someone suggest a way to rank "similar" phrases by meaning to a given phrase.