Should the "perplexity" (or "score") go up or down in the LDA implementation of Scikit-learn?
Asked Answered
S

1

7

I'd like to know what does the perplexity and score means in the LDA implementation of Scikit-learn. Those functions are obscure.

At the very least, I need to know if those values increase or decrease when the model is better. I've searched but it's somehow unclear. I feel that the perplexity should go down, but I'd like a clear answer on how those values should go up or down.

Sprint answered 7/8, 2018 at 20:35 Comment(1)
how does one interpret a 3.35 vs a 3.25 perplexity? I am trying to understand if that is a lot better or not. For example, if I had a 10% accuracy improvement or even 5% I'd certainly say that method "helped advance state of the art SOTA". But how does one interpret that in perplexity? fyi, context of paper: aitp-conference.org/2022/abstract/AITP_2022_paper_5.pdfTame
D
7

Perplexity is the measure of how well a model predicts a sample.

According to Latent Dirichlet Allocation by Blei, Ng, & Jordan,

[W]e computed the perplexity of a held-out test set to evaluate the models. The perplexity, used by convention in language modeling, is monotonically decreasing in the likelihood of the test data, and is algebraicly equivalent to the inverse of the geometric mean per-word likelihood. A lower perplexity score indicates better generalization performance.

This can be seen with the following graph in the paper:

enter image description here

In essense, since perplexity is equivalent to the inverse of the geometric mean, a lower perplexity implies data is more likely. As such, as the number of topics increase, the perplexity of the model should decrease.

Dunedin answered 7/8, 2018 at 20:58 Comment(4)
how does one interpret a 3.35 vs a 3.25 perplexity? I am trying to understand if that is a lot better or not. For example, if I had a 10% accuracy improvement or even 5% I'd certainly say that method "helped advance state of the art SOTA". But how does one interpret that in perplexity? fyi, context of paper: aitp-conference.org/2022/abstract/AITP_2022_paper_5.pdfTame
There is still something that bothers me with this accepted answer, it is that on one side, yes, it answers so as to compare different counts of topics. But what if the number of topics was fixed? What would a change in perplexity mean for the same data but let's say with better or worse data preprocessing? I assume that for the same topic counts and for the same underlying data, a better encoding and preprocessing of the data (featurisation) and a better data quality overall bill contribute to getting a lower perplexity. Am I right?Sprint
@GuillaumeChevalier Yes, as far as I understood, with better data it will be possible for the model to reach higher log likelihood and hence, lower perplexity.Gustatory
For some reason I'm finding the opposite happen: as my number of topics increases, perplexity goes not down but up, and significantly so.Singlebreasted

© 2022 - 2024 — McMap. All rights reserved.