Silhouette plot in R
Asked Answered
P

1

10

I have a set of data containing: item, associated cluster, silhouette coefficient. I can further augment this data set with more information if necessary.

I would like to generate a silhouette plot in R. I am having trouble with this because examples I came across use the built-in kmeans (or related) clustering function and plot the result. I want to bypass this step and produce the plot for my own clustering algorithm but I'm ending up short on providing the correct arguments to the plot function.

Thank you.

EDIT

Data set example https://pastebin.mozilla.org/8853427

What I've tried is loading the dataset and passing it to the plot function using various arguments based on https://stat.ethz.ch/R-manual/R-devel/library/cluster/html/silhouette.html

Plafker answered 30/11, 2015 at 12:58 Comment(2)
Please provide some of your data and the code you triedGamp
Here's how to create a reproducible example in R. It makes it easier for others to help you.Greco
A
14

Function silhouette in package cluster can do the plots for you. It just needs a vector of cluster membership (produced from whatever algorithm you choose) and a dissimilarity matrix (probably best to use the same one used in producing the clusters). For example:

library (cluster)
library (vegan)
data(varespec)
dis = vegdist(varespec)
res = pam(dis,3) # or whatever your choice of clustering algorithm is
sil = silhouette (res$clustering,dis) # or use your cluster vector
windows() # RStudio sometimes does not display silhouette plots correctly
plot(sil)

EDIT: For k-means (which uses squared Euclidean distance)

library (vegan)
library (cluster)
data(varespec)
dis = dist(varespec)^2
res = kmeans(varespec,3)
sil = silhouette (res$cluster, dis)
windows() 
plot(sil)
Anemo answered 5/12, 2015 at 20:13 Comment(4)
Can you go into a bit more detail about the code? What will dis contain what will res contain?Plafker
dis will be a distance/dissimilarity matrix of class dist. See ?vegdist for details. res in this case is the results object of pam (partitioning around medoids); within this clustering is a vector containing the identities of the clusters to which each sample has been assigned. Whatever algorithm you are using, you need to extract the cluster membership vector from the results. Which method do you hope to use?Anemo
Data is already clustered using Kmeans. And the silhouette coefficient has been computed. I'll accept your answer as soon as I get a chance to test this out and see it works.Plafker
what is the reason you use squared distance here in k-means example?Mahdi

© 2022 - 2024 — McMap. All rights reserved.