How to choose appropriate quantile value while estimating bandwidth in MeanShift module of python?
Asked Answered
S

1

8

I am performing mean shift clustering on a dataset. estimate_bandwidth function estimates the appropriate bandwidth to perform mean-shift clustering.

Syntax:

sklearn.cluster.estimate_bandwidth(X, quantile=0.3, n_samples=None, random_state=0)

I found out that the estimated bandwidth increases with increase in quantile resulting in less number of clusters. Similarly, decrease in quantile decreases the bandwidth and hence higher no. of clusters.

So, it seems no. of clusters is dependent upon quantile value chosen.

How to choose the optimum quantile?

Sulfuric answered 5/2, 2015 at 2:15 Comment(3)
Luck and experience. Unfortunately. But what is the “optimum” anyway?Dardanelles
"Optimum" in a sense that the clusters are stable.Sulfuric
Then infinity would b optimal in that sense.Dardanelles
L
1

The quantile is used in KNN (which is used inside the estimate_bandwidth function) to determine the bandwidth.
Concretely:

n = Number of samples in KNN = number of samples in the batch * quantile

Bandwidth will be then calculated based on the average pairwise distances between the samples that are in the same cluster (returned by KNN). So you can use this to kind of figure out how to set the bandwidth. The bandwidth that is returned by this function will, on average, cover n samples, which will strongly affect the number of clusters that Mean Shift will return.

Legitimist answered 4/4, 2019 at 21:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.