How can silhouette scores be negative?
Asked Answered
C

1

14

If we have some datapoints:

enter image description here

And we use, for example, k-means to segment; are the resulting segments not such that every point is closest to the center-of-mass of its respective cluster? And if so, when silhouette score compares ai (average distance to intra-cluster points) vs bi (average distance to extra-cluster points), how can it ever be the case that the score is negative, or that bi is less than ai?

I can see maybe for different classification algorithms, some more sophisticated ones may cluster differently, or some points are assigned incorrectly. But how does this happen for k-means?

Chagres answered 28/8, 2020 at 19:27 Comment(0)
N
13

A point i's average distance to points in a cluster is not the same as its distance to the center-of-mass of that cluster. Silhouette score uses the former when calculating a(i) and b(i), while k-means uses the latter in cluster assignment, so there may be disagreement.

For example, in the image below: suppose the blue points are already assigned to one cluster and the green points to another. To which cluster will the red point be assigned? The center-of-mass of the blue cluster is at (0, 1) and the center-of-mass of the green cluster is at (0, -1.15), so the red point will be assigned to the blue cluster. However, its average distance to the green points is 1.15 while its average distance to the blue points is 1.414, so it will get a negative silhouette score.

silhouette score negative example

Nerine answered 22/3, 2021 at 17:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.