I have a semi-structured dataset, each row pertains to a single user:
id, skills
0,"java, python, sql"
1,"java, python, spark, html"
2, "business management, communication"
Why semi-structured is because the followings skills can only be selected from a list of 580 unique values.
My goal is to cluster users, or find similar users based on similar skillsets. I have tried using a Word2Vec model, which gives me very good results to identify similar skillsets - For eg.
model.most_similar(["Data Science"])
gives me -
[('Data Mining', 0.9249375462532043),
('Data Visualization', 0.9111810922622681),
('Big Data', 0.8253220319747925),...
This gives me a very good model for identifying individual skills and not group of skills. how do I make use of the vector provided from the Word2Vec model to successfully cluster groups of similar users?