How to evaluate Word2Vec model

Asked 4/10, 2018 at 11:22 Answered 15/11, 2019 at 0:56

Solved python nlp word2vec embedding word-embedding

Hi have my own corpus and I train several Word2Vec models on it. What is the best way to evaluate them one against each-other and choose the best one? (Not manually obviously - I am looking for various measures).

It worth noting that the embedding is for items and not word, therefore I can't use any existing benchmarks.

Thanks!

Afterthought answered 4/10, 2018 at 11:22 Comment(0)

There's no generic way to assess token-vector quality, if you're not even using real words against which other tasks (like the popular analogy-solving) can be tried.

If you have a custom ultimate task, you have to devise your own repeatable scoring method. That will likely either be some subset of your actual final task, or well-correlated with that ultimate task. Essentially, whatever ad-hoc method you may be using the 'eyeball' the results for sanity should be systematized, saving your judgements from each evaluation, so that they can be run repeatedly against iterative model improvements.

(I'd need more info about your data/items and ultimate goals to make further suggestions.)

Chattanooga answered 6/10, 2018 at 5:19 Comment(2)

Thank a lot! perplexity/entropy/etc. can't be used generically? (The data is product ids in a catalog. I want to treat session as sentence and products as word to represent the products as vector using word2vec) – Afterthought 7/10, 2018 at 6:43

I suppose it's possible to check the model's predictiveness on the training texts, or some other held-out test texts, but I haven't seen those measures used to choose between word2vec models, and I'm not sure they'd correlate well with performance on your ultimate task. It's the act of trying-to-get-good at word-predictiveness that can make word-vectors usefully-arranged for other purposes – but it need not be the case that the model best at its training-goal is also best for downstream goals. So optimizing for a task-specific evaluation is best. – Chattanooga 8/10, 2018 at 1:17

One way to evaluate the word2vec model is to develop a "ground truth" set of words. Ground truth will represent words that should ideally be closest together in vector space. For example if your corpus is related to customer service, perhaps the vectors for "dissatisfied" and "disappointed" will ideally have the smallest euclidean distance or largest cosine similarity.

You create this table for ground truth, maybe it has 200 paired words. These 200 words are the most important paired words for your industry / topic. To assess which word2vec model is best, simply calculate the distance for each pair, do it 200 times, sum up the total distance, and the smallest total distance will be your best model.

I like this way better than the "eye-ball" method, whatever that means.

Mackey answered 30/4, 2019 at 2:4 Comment(1)

Well, the model trained as an unsupervised model, but now I should label the data, I think it would be a lot of pain – Burt 4/3, 2020 at 1:22

One of the ways of evaluating the Word2Vec model would be to apply the K-Means algorithm on the features generated by the Word2Vec. Along with that create your own manual labels/ground truth representing the instances/records. You can calculate the accuracy of the model by comparing the clustered result tags with the ground truth label.

Eg: CLuter 0 - Positive -{"This is a good restaurant", "Good food here", "Not so good dinner"} Cluster 1 - Negative - {"This is a fantastic hotel", "food was stale"}

Now, compare the tags/labels generated by the clusters with the ground truth values of the instances/sentences in the clusters and calculate the accuracy.

Gera answered 15/11, 2019 at 0:56 Comment(0)

Recommended topics

Hot tags