Which algorithm and what combination of hyper-parameters will be the best to cluster this data?
Asked Answered
B

3

6

I was learning about non-linear clustering algorithms and I came across this 2-D graph. I was wondering which clustering alogirthm and combination of hyper-parameters will cluster this data well.

Plot

Just like a human will cluster those 5 spikes. I want my algorithm to do it. I tried KMeans but it was only clustering horizontly or vertically. I started using GMM but couldn't get the hyper-parameters right for the desired clustering.

Bloomy answered 31/5, 2019 at 12:39 Comment(1)
Lateral question. Do you see five clusters or six? Or approximately five lines?Veriee
O
3

If it doesn't work, always try to improve the preprocessing first. Algorithms such as k-means are very sensitive to scaling, so that is something that needs to be chosen carefully.

GMM is clearly your first choice here. It may be worth trying out different tools. R's Mclust is very slow. Sklearn's GMM is sometimes unstable. ELKI is a bit harder to get started with, but its EM gave me the best results usually.

Apart from GMM, it likely is worth trying out correlation clustering. These algorithms assume there is some manifold (e.g., a line) on which a cluster exists. Examples include ORCLUS, LMCLUS, CASH, 4C, ... But in my opinion these mostly work for synthetic toy data.

Odetteodeum answered 31/5, 2019 at 20:6 Comment(0)
G
1

I will suggest trying out hierarchical clustering. In the Agglomerative approach, you will assign individual clusters to each point, and then combine clusters based on their distances from each other.

Gantlet answered 31/5, 2019 at 20:6 Comment(0)
R
1

DBSCAN or GMM should work well to cluster this type of data.

It is one of the few clustering algorithms that does not classify the data into circular clusters

Clustering with DBSCAN

DBSCAN

Clustering with GMM

GMM

Also please do give this blog a read. It will explain the different clustering techniques.

Rotation answered 31/5, 2019 at 20:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.