Gensim LDA Coherence Score Nan
Asked Answered
W

2

8

I created a Gensim LDA Model as shown in this tutorial: https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/

lda_model = gensim.models.LdaMulticore(data_df['bow_corpus'], num_topics=10, id2word=dictionary, random_state=100, chunksize=100, passes=10, per_word_topics=True)

And it generates 10 topics with a log_perplexity of:

lda_model.log_perplexity(data_df['bow_corpus']) = -5.325966117835991

But when I run the coherence model on it to calculate coherence score, like so:

coherence_model_lda = CoherenceModel(model=lda_model, texts=data_df['bow_corpus'].tolist(), dictionary=dictionary, coherence='c_v')
with np.errstate(invalid='ignore'):
    lda_score = coherence_model_lda.get_coherence()

My LDA-Score is nan. What am I doing wrong here?

Whap answered 16/2, 2020 at 8:3 Comment(0)
W
12

Solved! Coherence Model requires the original text, instead of the training corpus fed to LDA_Model - so when i ran this:

coherence_model_lda = CoherenceModel(model=lda_model, texts=data_df['corpus'].tolist(), dictionary=dictionary, coherence='c_v')
with np.errstate(invalid='ignore'):
    lda_score = coherence_model_lda.get_coherence()

I got a coherence score of: 0.462

Hope this helps someone else making the same mistake. Thanks!

Whap answered 16/2, 2020 at 8:45 Comment(2)
Was facing the same issue! Thanks for sharing!Metallo
Thank you! I'm testing it out now. How long did you wait for the coherence score to tabulate? I waited just under 2hours and it's still runningPledgee
B
1

The documentation (https://radimrehurek.com/gensim/models/coherencemodel.html) says to provide "Tokenized texts" (list of list of str) - these should be your texts split into individual words that are in the dictionary you provide to CoherenceModel. If you provide the full texts that are not tokenized, there are no entries in the lookup dictionary for the words.

Bronchiole answered 2/6, 2021 at 15:13 Comment(1)
upvote since this is a possible issue, but OP had done another mistake with the same solutionFlowing

© 2022 - 2024 — McMap. All rights reserved.