What routine or algorithm should I use to provide eps and minPts parameters to DBSCAN algorithm for efficient results?
The DBSCAN paper suggests to choose minPts based on the dimensionality, and eps based on the elbow in the k-distance graph.
In the more recent publication
Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017).
DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN.
ACM Transactions on Database Systems (TODS), 42(3), 19.
the authors suggest to use a larger minpts for large and noisy data sets, and to adjust epsilon depending on whether you get too large clusters (decrease epsilon) or too much noise (increase epsilon). Clustering requires iterations.
That paper was an interesting read, because it shows what can go wrong if you don't look at your data. People are too obsesses with performance metrics, and forget to look at the actual data.
© 2022 - 2024 — McMap. All rights reserved.