whats is the difference between "k means" and "fuzzy c means" objective functions?

Asked 27/2, 2010 at 1:37 Answered 10/10, 2023 at 15:42

Solved cluster-analysis k-means fuzzy-c-means

I am trying to see if the performance of both can be compared based on the objective functions they work on?

Gleeman answered 27/2, 2010 at 1:37 Comment(1)

Come on! Do not close... clustering IS programming related, at the same level that say sorting algorithms or questions on formal grammar! – Ship 27/2, 2010 at 1:51

BTW, the Fuzzy-C-Means (FCM) clustering algorithm is also known as Soft K-Means.

The objective functions are virtually identical, the only difference being the introduction of a vector which expresses the percentage of belonging of a given point to each of the clusters. This vector is submitted to a "stiffness" exponent aimed at giving more importance to the stronger connections (and conversely at minimizing the weight of weaker ones); incidently, when the stiffness factor tends towards infinity the resulting vector becomes a binary matrix, hence making the FCM model identical to that of the K-Means.

I think that except for some possible issue with the clusters which have no points assigned to them, it is possible to emulate the K-Means algorithm with that of the FCM one, by simulating an infinite stiffness factor (= by introducing a function which changes the biggest value in the vector to 1, and zeros out the other values, in lieu of the exponentiation of the vector). This is of course a very inefficient way of running a K-Means, because the algorithm then has to perform as many operations as with a true FCM (if only with 1 and 0 values, which does simplify the arithmetic, but not the complexity)

With regards to performance, the FCM therefore needs to perform k (i.e. number of clusters) multiplications for each point, for each dimension (not counting also the exponentiation to take stiffness into account). This, plus the overhead needed for computing and managing the proximity vector, explains why FCM is quite slower than plain K-Means.

But FCM/Soft-K-Means is less "stupid" than Hard-K-Means when it comes for example to elongated clusters (when points otherwise consistent in other dimensions tend to scatter along a particular dimension or two), and that's why it's still around ;-)

From my original reply:

Also, I just thought about this, but haven't put any "mathematical" thought to it, FCM may converge faster than hard K-Means, somewhat offsetting the bigger computational requirement of FCM.

May 2018 edit:

There is actually no reputable research that I could identify which support my above hunch about FCM's faster rate of convergence. Thank you Benjamin Horn to keep me honest ;-)

Ship answered 27/2, 2010 at 3:2 Comment(6)

Why should FCM converge faster? It doesn't really converge at all, you have to stop at a certain threshold, when the relative assignments do no longer change "enough"; just like GMM-EM clustering. – Elspet 1/6, 2013 at 11:13

@Anony-Mousse : Both FCM and K-Means converge, in the mathematical sense, that is very much what you describe with when the relative assignments do no longer change "enough". In other words the clustering solution provided by successive iterations of these algorithms change a lot, at first, from one iteration to the next, but eventually the changes become smaller and smaller as the function approaches its limit. It is safe to stop iterating after a practical change threshold is reached because the function is convergent: iterating more won't produce a significantly different result... – Ship 3/6, 2013 at 15:13

... What I am yet to try and study, is whether FCM actually converges faster than hard K-Means. In other words if it takes fewer iterations with FCM (than with plain K-Means) to reach the desired "stable" solution. – Ship 3/6, 2013 at 15:16

k-means actually converges with epsilon=0. FCM doesn't. So they converge quite differently in nature IMHO. – Elspet 3/6, 2013 at 16:13

I think this should be edited as there is no hard math or research behind saying FCM does converge faster than K - Means. – Execution 4/4, 2018 at 23:41

@BenjaminHorn Thanks for keeping me honest! I originally thought to survey this issue and research it if need be... But never got/took the time to do so. Prompted by your remark, in recent weeks I checked the literature on this topic but found but little material which would either support or contradict this hunch of mine. – Ship 6/5, 2018 at 23:26

K-Means clustering and Fuzzy-C Means Clustering are very similar in approaches. The main difference is that, in Fuzzy-C Means clustering, each point has a weighting associated with a particular cluster, so a point doesn't sit "in a cluster" as much as has a weak or strong association to the cluster, which is determined by the inverse distance to the center of the cluster.

Fuzzy-C means will tend to run slower than K means, since it's actually doing more work. Each point is evaluated with each cluster, and more operations are involved in each evaluation. K-Means just needs to do a distance calculation, whereas fuzzy c means needs to do a full inverse-distance weighting.

Decolorize answered 27/2, 2010 at 1:48 Comment(0)

C-means is fuzzy but k-means is hard (is not fuzzy), each point is belonging to a centroid in K-means, but in fuzzy c-means each point can be belonging to two centroids but with different quality.

each point either is a part of the first centroids, or the second centroids.but in C-means, one point can be part of first centroids (90%) and second centroids (10%).for example, student failed or passed if she/he has 49. it somehow is pass and the reality is failed, that time we called fuzzy.

Wortman answered 30/3, 2019 at 23:19 Comment(0)

people has written technically and each answer is well written. But what I want to say is the same in layman language. K means clustering cluster the entire dataset into K number of cluster where a data should belong to only one cluster. Fuzzy c-means create k numbers of clusters and then assign each data to each cluster, but their will be a factor which will define how strongly the data belongs to that cluster.

Justiceship answered 29/7, 2016 at 6:24 Comment(0)

C-Means; meaning data points can belong to multiple clusters with varying degrees,which can be useful for dealing with data that is inherently uncertain or doesn't cleanly belong to a single category. while K-Means assigns each data point exclusively to one cluster.

Inappropriate answered 10/10, 2023 at 15:42 Comment(0)

Recommended topics

Hot tags