How to print result of clustering in sklearn

Asked 22/4, 2015 at 13:26 Answered 23/10, 2019 at 9:49

Solved python scikit-learn cluster-analysis k-means

I have a sparse matrix

from scipy.sparse import *
M = csr_matrix((data_np, (rows_np, columns_np)));

then I'm doing clustering that way

from sklearn.cluster import KMeans
km = KMeans(n_clusters=n, init='random', max_iter=100, n_init=1, verbose=1)
km.fit(M)

and my question is extremely noob: how to print the clustering result without any extra information. I don't care about plotting or distances. I just need clustered rows looking that way

Cluster 1
row 1
row 2
row 3

Cluster 2
row 4
row 20
row 1000
...

How can I get it? Excuse me for this question.

Pagas answered 22/4, 2015 at 13:26 Comment(0)

Time to help myself. After

km.fit(M)

we run

labels = km.predict(M)

which returns labels, numpy.ndarray. Number of elements in this array equals number of rows. And each element means that a row belongs to the cluster. For example: if first element is 5 it means that row 1 belongs to cluster 5. Lets put our rows in a dictionary of lists looking this way {cluster_number:[row1, row2, row3], ...}

# in row_dict we store actual meanings of rows, in my case it's russian words
clusters = {}
    n = 0
    for item in labels:
        if item in clusters:
            clusters[item].append(row_dict[n])
        else:
            clusters[item] = [row_dict[n]]
        n +=1

and print the result

for item in clusters:
    print "Cluster ", item
    for i in clusters[item]:
        print i

Pagas answered 22/4, 2015 at 14:41 Comment(1)

thanks so much, open my mind too :) very helpful answer – Confederacy 2/1, 2019 at 9:25

Update: You can do it the following way

"""data= data clustered retrieved by function as you want"""
"""model = result from the data with got by KMeans"""
"""cluster = clusters formed by the model"""
from sklearn.cluster import KMeans

data = clusteredData()
model = KMeans(n_clusters=5, init='random', max_iter=100, n_init=1, verbose=1)
cluster = model.fit_predict(scale(data))

dictionary = {}
for index in range(len(data)): 
    if cluster[index] in dictionary:
        value = []
        value = dictionary[cluster[index]]
        value.append(data[index])
        dictionary[cluster[index]] = value
    else:
        dictionary[cluster[index]]=data[index]

This will create you a dictionary with the NUMBER_OF_THE_CLUSTER as a key and the data within that cluster as a VALUE

Velure answered 23/10, 2019 at 9:49 Comment(0)

Recommended topics

Hot tags