How to print tf-idf scores matrix in sklearn in python
Asked Answered
A

3

6

I using sklearn to obtain tf-idf values as follows.

from sklearn.feature_extraction.text import TfidfVectorizer
myvocabulary = ['life', 'learning']
corpus = {1: "The game of life is a game of everlasting learning", 2: "The unexamined life is not worth living", 3: "Never stop learning"}
tfidf = TfidfVectorizer(vocabulary = myvocabulary, ngram_range = (1,3))
tfs = tfidf.fit_transform(corpus.values())

Now I want to view my calculated tf-idf scores in a matrix as follows. tf-idf matrix

I tried to do it as follows.

idf = tfidf.idf_
dic = dict(zip(tfidf.get_feature_names(), idf))
print(dic)

However, then I get the output as follows.

{'life': 1.2876820724517808, 'learning': 1.2876820724517808}

Please help me.

Aureaaureate answered 6/10, 2017 at 2:40 Comment(1)
The actual output you get from the tfidf.fit_transform() is in this form only. Only thing needed is the column names which you get from tfidf.get_feature_names(). Just wrap these two into a dataframe.Berkeleianism
A
7

Thanks to σηγ I could find an answer from this question

feature_names = tfidf.get_feature_names()
corpus_index = [n for n in corpus]
import pandas as pd
df = pd.DataFrame(tfs.T.todense(), index=feature_names, columns=corpus_index)
print(df)
Aureaaureate answered 6/10, 2017 at 8:57 Comment(0)
I
3

The Answer provided by the questioner is right , I would like to make one adjustment. The above code gives

         Doc1     Doc2

feature1

feature2

The matrix should be looking like this

         feature1     feature2

Doc1

Doc2

so you can make a simple change to get it

df = pd.DataFrame(tfs.todense(), index=corpus_index, columns=feature_names)
Ibbison answered 18/10, 2017 at 8:4 Comment(0)
N
1

I found another possible approach using toarray() function

import pandas as pd
print(tfidf.get_feature_names())
print(tfs.toarray())
print(pd.DataFrame(tfs.toarray(), 
columns=tfidf.get_feature_names(), 
index=['doc1','doc2','doc3'])) `
Nealneala answered 21/8, 2018 at 12:3 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.