what is the difference between "sklearn.cluster.k_means" and "sklearn.cluster.KMeans" when I should use one of them?

From the sklearn glossary: "[w]e provide ad hoc function interfaces for many algorithms, while estimator classes provide a more consistent interface." k_means() is just a wrapper that returns the result of KMeans.fit():

cluster_centers_,
labels_,
inertia_,
n_iter_

KMeans is a class designed following the developer guide for sklearn objects. KMeans, like other classifier objects in sklearn, must implement methods for:

fit(),
transform(), and
score().

and can also implement other methods like predict(). The main benefit of using KMeans over k_means() is that you have easy access to the other methods implemented in KMeans. For example, if you want to use your trained model to predict which cluster unseen data belongs to:

from sklearn.clustering import KMeans

est = KMeans()
KMeans.fit(X_train)
cluster_labels = est.predict(X_test)

If you use the functional API, to apply the prediction you would have to look under the hood of KMeans.predict() to figure out how to do this.

The functional design is not implemented for all sklearn objects, but you can easily implement this yourself using other examples from sklearn to guide you.

Recommended topics

Hot tags