I'm clustering a sample of about 100 records (unlabelled) and trying to use grid_search to evaluate the clustering algorithm with various hyperparameters. I'm scoring using silhouette_score
which works fine.
My problem here is that I don't need to use the cross-validation aspect of the GridSearchCV
/RandomizedSearchCV
, but I can't find a simple GridSearch
/RandomizedSearch
. I can write my own but the ParameterSampler
and ParameterGrid
objects are very useful.
My next step will be to subclass BaseSearchCV
and implement my own _fit()
method, but thought it was worth asking is there a simpler way to do this, for example by passing something to the cv
parameter?
def silhouette_score(estimator, X):
clusters = estimator.fit_predict(X)
score = metrics.silhouette_score(distance_matrix, clusters, metric='precomputed')
return score
ca = KMeans()
param_grid = {"n_clusters": range(2, 11)}
# run randomized search
search = GridSearchCV(
ca,
param_distributions=param_dist,
n_iter=n_iter_search,
scoring=silhouette_score,
cv= # can I pass something here to only use a single fold?
)
search.fit(distance_matrix)
BaseSearchCV
subclasses? Have I missed some feature for optimising hyperparameters, or do you mean write something specific for each algorithm? – Rookie