R Supervised Latent Dirichlet Allocation Package
Asked Answered
E

1

14

I'm using this LDA package for R. Specifically I am trying to do supervised latent dirichlet allocation (slda). In the linked package, there's an slda.em function. However what confuses me is that it asks for alpha, eta and variance parameters. As far as I understand, I thought these parameters are unknowns in the model. So my question is, did the author of the package mean to say that these are initial guesses for the parameters? If yes, there doesn't seem to be a way of accessing them from the result of running slda.em.

Aside from coding the extra EM steps in the algorithm, is there a suggested way to guess reasonable values for these parameters?

Eclogite answered 27/4, 2016 at 23:40 Comment(0)
O
4

Since you are trying to generate a supervised model, the typical approach would be to use cross validation to determine the model parameters. So you hold out some of the data as your test set, train the a model on the remaining data, and evaluate the model performance, repeating k times. You then continue to repeat with different model parameters to determine which result in the best model performance.

In the specific case of slda, I would run demo(slda) to see the author's implementation of it. When you run the demo, you'll see that he sets alpha=1.0, eta=0.1, and variance=0.25. I'd suggest using these as your starting point, and then use cross validation to determine better parameters if you need to improve model performance.

Ouster answered 3/5, 2016 at 19:26 Comment(3)
You're saying that the LDA package quoted above does not have an option to search for the alpha, eta and variance parameters (unlike the EM algorithm in the linked paper)? Doing cross validation as you suggest would be immensely slow, unless you have a suggestion for how to recycle outputs of each step. I'm guessing maybe the "initial" parameters in the model might help this?Eclogite
I am not terribly familiar with this package, but I did not see any mention of it finding the parameters for you. It's not the default option - you can see that by looking at the results of the demo by changing the initial parameters - you end up with different results. I am not sure what you mean by "recycle outputs of each step", but it is true that CV can be time-consuming if there are a lot of parameters to search over.Ouster
@AlexR. Can you provide a data sample, and a little bit more detail on your end goal? That would make it easier to provide an example code solution. There are least two packages in R that can be used for performing LDA. One is the topic models package developed by Bettina Grün and Kurt Hornik and the second lda, developed by Jonathan Chang, which you mentioned using.Azurite

© 2022 - 2024 — McMap. All rights reserved.