How to get document_topics distribution of all of the document in gensim LDA?
Asked Answered
G

1

10

I'm new to python and I need to construct a LDA project. After doing some preprocessing step, here is my code:

dictionary = Dictionary(docs)
corpus = [dictionary.doc2bow(doc) for doc in docs]

from gensim.models import LdaModel
num_topics = 10
chunksize = 2000
passes = 20
iterations = 400
eval_every = None
temp = dictionary[0]
id2word = dictionary.id2token
model = LdaModel(corpus=corpus, id2word=id2word, chunksize=chunksize, \
                       alpha='auto', eta='auto', \
                       random_state=42, \
                       iterations=iterations, num_topics=num_topics, \
                       passes=passes, eval_every=eval_every)

I want to get a topic distribution of docs, all of the document and get 10 probability of topic distribution, but when I use:

get_document_topics = model.get_document_topics(corpus)
print(get_document_topics)

The output only appear

<gensim.interfaces.TransformedCorpus object at 0x000001DF28708E10>

How do I get a topic distribution of docs?

Glenda answered 15/11, 2018 at 6:23 Comment(0)
Z
12

The function get_document_topics takes an input of a single document in BOW format. You're calling it on the full corpus (an array of documents) so it returns an iterable object with the scores for each document.

You have a few options. If you just want one document, run it on the document you want the values for:

get_document_topics = model.get_document_topics(corpus[0])

or do the following to get an array of scores for all the documents:

get_document_topics = [model.get_document_topics(item) for item in corpus]

Or directly access each object from your original code:

get_document_topics = model.get_document_topics(corpus)
print(get_document_topics[0])
Zaria answered 15/11, 2018 at 8:41 Comment(2)
Thanks! Is it posible to get a topic distribution about a docs not a single document? I want to check out the importance of the 10 topics in the corpus.Glenda
I'm not sure exactly what you're looking for. LDA works by figuring out how important a topic is for a document, relative to the whole corpus. If you want to see what it thinks of as a topic, use model.show_topics(). According to the gensim documentation at radimrehurek.com/gensim/models/… : "Unlike LSA, there is no natural ordering between the topics in LDA. The returned topics subset of all topics is therefore arbitrary and may change between two LDA training runs."Zaria

© 2022 - 2024 — McMap. All rights reserved.