How can I choose eps and minPts (two parameters for DBSCAN algorithm) for efficient results?
Asked Answered
S

1

3

What routine or algorithm should I use to provide eps and minPts parameters to DBSCAN algorithm for efficient results?

Scrawny answered 28/11, 2017 at 14:25 Comment(1)
Define "efficient"Foetor
I
8

The DBSCAN paper suggests to choose minPts based on the dimensionality, and eps based on the elbow in the k-distance graph.

In the more recent publication

Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017).
DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN.
ACM Transactions on Database Systems (TODS), 42(3), 19.

the authors suggest to use a larger minpts for large and noisy data sets, and to adjust epsilon depending on whether you get too large clusters (decrease epsilon) or too much noise (increase epsilon). Clustering requires iterations.

That paper was an interesting read, because it shows what can go wrong if you don't look at your data. People are too obsesses with performance metrics, and forget to look at the actual data.

Isadoraisadore answered 28/11, 2017 at 16:48 Comment(2)
How would you automate this - OPTICS?Cf
@Cf Data science is not about automatization :)Riarial

© 2022 - 2024 — McMap. All rights reserved.