Python, Scikit-learn, K-means: What does the parameter n_init actually do? [duplicate] - McMap

About

Python, Scikit-learn, K-means: What does the parameter n_init actually do? [duplicate]

Asked 22/9, 2017 at 7:47 Answered 22/9, 2017 at 9:53

Solved python machine-learning scikit-learn cluster-analysis k-means

L

1

1

I'm a beginner for Python. Now, I'm trying to understand what the parameter n_init from sklearn.cluster.KMeans

From the documentation:

n_init : int, default: 10

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

At first, I thought it means the number of time the code would run until I found this helpful question, and I realized that's what max_iter do.

What exactly does the parameter n_init do? I really don't understand it.

Lubricous answered 22/9, 2017 at 7:47 Comment(2)

Since the starting points are randomized, n_init states how many different sets of random points the algorithm should use. It then gives the best run in terms of inertia (how little the algo was moving at the end of the run -small steps --> closer to best solution) – Berkie 22/9, 2017 at 7:52

It will initialize the centroids for clusters randomly this many times. Depending on the initial value of centroids, the clusters formed may be different. – Scammon 22/9, 2017 at 8:12

T

6

In K-means the initial placement of centroid plays a very important role in it's convergence. Sometimes, the initial centroids are placed in a such a way that during consecutive iterations of K-means the clusters the clusters keep on changing drastically and even before the convergence condition may occur, max_iter is reached and we are left with incorrect cluster. Hence, the clusters obtained in such may not be correct. To overcome this problem, this parameter is introduced. The value of n_iter basically determines how many different sets of randomly chosen centroids, should the algorithm use. For each different set of points, a comparision is made about how much distance did the clusters move, i.e. if the clusters travelled small distances than it is highly likely that we are closest to ground truth/best solution. The points which provide the best performance and their respective run along with all the cluster labels are returned.

If you are interested, you can also look at k-means++ algorithm designed specifically to deal with this problem.

You can also look at this link for more details about the initial centroids matter.

Tricorn answered 22/9, 2017 at 9:53 Comment(3)

if someone uses n_init=10 and random_state = 1234, then then answer does not make sense. How can you initialize randomly 10 times the centroids having a fixed random_state ??? – Softshoe 6/8, 2019 at 12:49

@serafeim it basically means to select 10 * (no. of centroids) uniformly, with random state set to 1234. Does this help in clearing your query ? – Tricorn 6/8, 2019 at 17:22

@Softshoe n_init determines the total runs\initializations\random numbers used, while random_state determines the initial random number generator seed - seeded before these runs begin, and thus makes sure the same 10 random numbers are generated across kmeans trials. – Microsurgery 8/1 at 0:9

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.