K- Means algorithm

Asked 28/6, 2009 at 21:22 Answered 22/11, 2013 at 6:2

java algorithm machine-learning grouping unsupervised-learning

I'm trying to program a k-means algorithm in Java. I have calculated a number of arrays, each of them containing a number of coefficients. I need to use a k-means algorithm in order to group all this data. Do you know of any implementation of this algorithm?

Dislike answered 28/6, 2009 at 21:22 Comment(0)

Classification, Clustering and grouping are well developed areas of IR. There is a very good (Java) library/software (open source) here Called WEKA. There are several algorithms for clustering there. Although there is a learning curve, it might useful when you encounter harder problems.

Trapes answered 28/6, 2009 at 22:4 Comment(0)

OpenCV is one of the most horribly written libraries I've ever had to use. On the other hand, Matlab does it very neatly.

If you have to code it yourself, the algorithm is incredibly simple for how efficient it is.

Pick number of clusters (k)
Make k points (they're going to be the centroids)
Randomize all these points location
Calculate Euclidean distance from each point to all centroids
Assign 'membership' of each point to the nearest centroid
Establish the new centroids by averageing locations of all points belonging to a given cluster
Goto 4 Until convergence is achieved, or changes made are irrelevant.

Prochora answered 30/6, 2009 at 19:38 Comment(2)

Using OpenCV for KMeans might be overkill, but I don't see how OpenCV is "horribly" written. It may not be as easy to use as matlab (matlab is proprietory, slow and meant to be an easy way to test out algorithms using the large amount of algorithms already available to you in matlab) but it is for sure way faster than matlab, simply by virtue of being coded in C. – Apure 1/7, 2009 at 6:41

Matlab is slow if you don't code in it properly. If you've written a 'for' loop in matlab, you're doing it wrong. – Procrastinate 4/4, 2011 at 23:48

Really, KMeans is a really easy algorithm. Any good reason why not hand coding it yourself? I did it in Qt and then ported the code to plain old STL, without too much problems.

I am started to be a fan to Joel's idea: no external dependencies, so please feel free to tell me what's good about a large piece of software you don't control, and others on this question have already mentioned it's not a good piece of software/

Talk is cheap, real man show their code to the world: http://github.com/elcuco/data_mining_demo

I should clean the code a little to be more generic, and current version is not ported to STL, but it's a start!

Tameratamerlane answered 28/6, 2009 at 21:22 Comment(1)

Hi elcuco, I have coded it myself, but wanted to crosscheck the initialization part. I wanted to see how others implementations had assigned initial clusters. I also think it's not a good idea using a code you don't have control over. I'll keep digging, thank you all! – Dislike 1/7, 2009 at 10:23

There's a very nice Python implementation of K-means clustering in "Programming Collective Intelligence". I highly recommend it.

I realize that you'll have to translate to Java, but it doesn't look to be too difficult.

Congressman answered 28/6, 2009 at 21:25 Comment(0)

Very old question but I noticed there is no mention of the Java Machine Learning Library which has an implementation of K-Means and includes some documentation about it's usage.

The project is not very active but the last version is relatively recent (July 2012)

Motivation answered 22/11, 2012 at 7:0 Comment(1)

I wouldn't use this package for clustering. The k-means implementation in Java-ML is ridiculously slow. – Anisaanise 22/12, 2013 at 23:19

It seems everyone who posted forgot to mention the defacto image processing library: OpenCV http://sourceforge.net/projects/opencvlibrary/. You would have to write a JNI wrapper around the C OpenCV code to get KMeans to work but the added benefit would be

You would know that the KMeans algorithm is heavily optimized
OpenCV makes use of your GPU extensively so it runs blazing fast

The main draw back is that you would have to write a JNI wrapper. I once needed a template matching routine and was faced with many alternatives but I found OpenCV to be by far the best, even though I was forced to write a JNI wrapper for it.

Apure answered 29/6, 2009 at 20:14 Comment(1)

Currently, OpenCV officially supports Java. So, there is no need for hand-written JNI. – Puritanism 9/2, 2014 at 21:18

//Aim:To implement Kmeans clustering algorithm.
//Program
import java.util.*;
class k_means
{
static int count1,count2,count3;
static int d[];
static int k[][];
static int tempk[][];
static double m[];
static double diff[];
static int n,p;

static int cal_diff(int a) // This method will determine the cluster in which an element go at a particular step.
{
int temp1=0;
for(int i=0;i<p;++i)
{
if(a>m[i])
diff[i]=a-m[i];
else
diff[i]=m[i]-a;
}
int val=0;
double temp=diff[0];
for(int i=0;i<p;++i)
{
if(diff[i]<temp)
{
temp=diff[i];
val=i;
}
}//end of for loop
return val;
}

static void cal_mean() // This method will determine intermediate mean values
{
for(int i=0;i<p;++i)
m[i]=0; // initializing means to 0
int cnt=0;
for(int i=0;i<p;++i)
{
cnt=0;
for(int j=0;j<n-1;++j)
{
if(k[i][j]!=-1)
{
m[i]+=k[i][j];
++cnt;
}}
m[i]=m[i]/cnt;
}
}

static int check1() // This checks if previous k ie. tempk and current k are same.Used as terminating case.
{
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
if(tempk[i][j]!=k[i][j])
{
return 0;
}
return 1;
}

public static void main(String args[])
{
Scanner scr=new Scanner(System.in);
/* Accepting number of elements */
System.out.println("Enter the number of elements ");
n=scr.nextInt();
d=new int[n];
/* Accepting elements */
System.out.println("Enter "+n+" elements: ");
for(int i=0;i<n;++i)
d[i]=scr.nextInt();
/* Accepting num of clusters */
System.out.println("Enter the number of clusters: ");
p=scr.nextInt();
/* Initialising arrays */
k=new int[p][n];
tempk=new int[p][n];
m=new double[p];
diff=new double[p];
/* Initializing m */
for(int i=0;i<p;++i)
m[i]=d[i];

int temp=0;
int flag=0;
do
{
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
{
k[i][j]=-1;
}
for(int i=0;i<n;++i) // for loop will cal cal_diff(int) for every element.
{
temp=cal_diff(d[i]);
if(temp==0)
k[temp][count1++]=d[i];
else
if(temp==1)
k[temp][count2++]=d[i];
else
if(temp==2)
k[temp][count3++]=d[i]; 
}
cal_mean(); // call to method which will calculate mean at this step.
flag=check1(); // check if terminating condition is satisfied.
if(flag!=1)
/*Take backup of k in tempk so that you can check for equivalence in next step*/
for(int i=0;i<p;++i)
for(int j=0;j<n;++j)
tempk[i][j]=k[i][j];

System.out.println("\n\nAt this step");
System.out.println("\nValue of clusters");
for(int i=0;i<p;++i)
{
System.out.print("K"+(i+1)+"{ ");
for(int j=0;k[i][j]!=-1 && j<n-1;++j)
System.out.print(k[i][j]+" ");
System.out.println("}");
}//end of for loop
System.out.println("\nValue of m ");
for(int i=0;i<p;++i)
System.out.print("m"+(i+1)+"="+m[i]+"  ");

count1=0;count2=0;count3=0;
}
while(flag==0);

System.out.println("\n\n\nThe Final Clusters By Kmeans are as follows: ");
for(int i=0;i<p;++i)
{
System.out.print("K"+(i+1)+"{ ");
for(int j=0;k[i][j]!=-1 && j<n-1;++j)
System.out.print(k[i][j]+" ");
System.out.println("}");
}
}
}
/*
Enter the number of elements
8
Enter 8 elements:
2 3 6 8 12 15 18 22
Enter the number of clusters:
3

At this step
Value of clusters
K1{ 2 }
K2{ 3 }
K3{ 6 8 12 15 18 22 }
Value of m
m1=2.0  m2=3.0  m3=13.5

At this step
Value of clusters
K1{ 2 }
K2{ 3 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.0  m2=5.666666666666667  m3=16.75

At this step
Value of clusters
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.5  m2=7.0  m3=16.75

At this step
Value of clusters
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 }
Value of m
m1=2.5  m2=7.0  m3=16.75

The Final Clusters By Kmeans are as follows:
K1{ 2 3 }
K2{ 6 8 }
K3{ 12 15 18 22 } */

Spotter answered 22/11, 2013 at 6:2 Comment(0)

Recommended topics

Hot tags