I have a numpy text file array at: https://github.com/alvations/anythingyouwant/blob/master/WN_food.matrix
It's a distance matrix between terms and each other, my list of terms are as such: http://pastebin.com/2xGt7Xjh
I used the follow code to generate a hierarchical cluster:
import numpy as np
from sklearn.cluster import AgglomerativeClustering
matrix = np.loadtxt('WN_food.matrix')
n_clusters = 518
model = AgglomerativeClustering(n_clusters=n_clusters,
linkage="average", affinity="cosine")
model.fit(matrix)
To get the clusters for each term, I could have done:
for term, clusterid in enumerate(model.labels_):
print term, clusterid
But how do I traverse the tree that the AgglomerativeClustering outputs?
Is it possible to convert it into a scipy dendrogram (http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.dendrogram.html)? And after that how do I traverse the dendrogram?
children_
attribute ofmodel
. – Hoberthobeyn
objects produces a tree with2n - 1
nodes. As the documentation says: "Values less than n_samples refer to leaves of the tree. A greater value i indicates a node with children children_[i - n_samples]". That should be sufficient information to traverse the tree. – Hoberthobeyi
. If the ID is less than the number of input objectsn_samples
, then the node is a leaf. Otherwise it's an internal node, and it joins two other nodes. The two nodes joined by nodei
are found inchildren_[i - n_samples]
. As an aside, if your goal is to convert this to a scipy dendrogram, why not just usescipy.cluster.hierarchy.linkage
rather thansklearn
? – Hoberthobey