how to speed up color-clustering in openCV?

Asked 28/11, 2012 at 20:10 Answered 29/11, 2012 at 12:53

Solved opencv cluster-analysis data-mining k-means image-segmentation

for a project I want to implement a color-clustering algorithm, which replace similar colors with the average color of a cluster.

For now, I use the kmeans-algorithm to cluster the whole image . But this take's a long time. Has someone an idea how to use kmeans to cluster a color-histogram , so I can perform this algorithm?

Attalanta answered 28/11, 2012 at 20:10 Comment(0)

Downsample the image first, then run k-means.

If you resize the image to 1/2th in both x and y, it shouldn't affect colors much, but k-means should take at most 1/4th of the time. If you resample to 1/10 of the width and height, k-means should run 100 times faster.

https://en.wikipedia.org/wiki/Color_quantization

By downsampling the image, you have less "pixels" to process during clustering. But in the end, it should produce roughly the same color scheme.

Small summary of k-means:

It maps each object (=pixel) to the nearest cluster center (= palette entry)
It recomputes each palette entry to best represent the assigned points (= pixels)
Repeat until nothing changes anymore.

So the real output is not an image or image regions. It's the palette.

You can then map an arbitrary image (including the full resolution version) to this color palette by simply replacing each pixel with the closest color!

Complexity and performance:

The complexity of k-means is O(n*k*i), where n is the number of pixels you have, k the desired number of output colors and i the number of iterations needed until convergence.

n: by downsampling, you can easily reduce n, the largest factor. In many situations, you can reduce this quite significantly before you see a degradation in performance.

k: this is your desired number of output colors. Whether you can reduce this or not depends on your actual use case.

i: various factors can have an effect on convergence (including both other factors!), but the strongest probably is having good starting values. So if you have a very fast but low quality method to choose the palette, run it first, then use k-means to refine this palette. Maybe OpenCV already includes an appropriate heuristic for this though!

You can see, the easiest approach is to reduce n. You can reduce n significantly, produce an optimized palette for the thumbnail, then rerun k-means on the full image refinining this palette. As - hopefully - this will reduce the number of iterations significantly, this can sometimes perform very well.

Odd answered 28/11, 2012 at 23:28 Comment(4)

thanks :) But how can i replace the color with the new one in the input image in this case? – Attalanta 29/11, 2012 at 0:42

That still works the same way. Map it to the closest color k-means found. – Odd 29/11, 2012 at 14:57

realy? The mask of one cluster has 1/2th of orginal image size. If i resize the mask up, i think the result is not exact,because it overwrites neightbour pixels on the orginal-image. – Attalanta 29/11, 2012 at 16:39

Which mask? Use the k-means cluster centers (=new palette). Map each pixel to the nearest color. – Odd 29/11, 2012 at 17:21

My answer is not connected with histogram clusterization but recently I need to speedup clusterization procedure of my algorithm. For this I did the following:

Resized image to smaller one (actually it was already suggested)
Reduced number of colors (image quantization). I did it as suggested here: How to reduce the number of colors in an image with OpenCV?.

And this really helped me to speedup clusterization in some times. Also you can try to play around with OpenCV's mean-shift filtering.

Shapiro answered 29/11, 2012 at 12:53 Comment(6)

Actually, he wants to use k-means for color quantization. Why would you first use quantization A to then run quantization B? Note that he is not clustering images, but colors. – Odd 29/11, 2012 at 14:58

@Anony-Mousse What do you mean by saying quantization A and B? – Shapiro 29/11, 2012 at 16:14

see en.wikipedia.org/wiki/Color_quantization -- k-means can be used for color quantization, which is probably what he is trying to do. It does not make sense to use a different color quantization before, does it? – Odd 29/11, 2012 at 17:22

@Anony-Mousse It will improve speed - kmeans works faster on smaller number of colors. – Shapiro 29/11, 2012 at 19:35

K-means complexity is O(n k i) where n is the number of pixels, k the number of clusters, and i is the number of iterations until convergence. Of course: if your pre-quantify the image a lot, k-means will likely need fewer iterations, but it will also degreade quality much more. Do you have any reference that says you should pre-quantify for k-means? – Odd 30/11, 2012 at 0:3

@Anony-Mousse sorry, I don't have it. Actually my supervisor advised me to do that. – Shapiro 30/11, 2012 at 13:40

You need to assign a weight for each data, i.e. the number of values in the histogram bin. Then, when you compture the new value for cluster centroids, you use a weighted average instead of plain average. But the interface of OpenCV KMeans clustering does not support weighted values. YOu can use the C clustering library which does support it, is quite well documented (although takes examples from bioinformatics), and is easy to integrate (a single .h/.c file).

Hourly answered 29/11, 2012 at 9:11 Comment(0)

Small summary of k-means:

Complexity and performance:

Recommended topics

Hot tags