cluster-analysis Questions
6
Solved
UPDATED: In the end, the solution I opted to use for clustering my large dataset was one suggested by Anony-Mousse below. That is, using ELKI's DBSCAN implimentation to do my clustering rather than...
Derick asked 5/5, 2013 at 5:4
6
Solved
I have a data table ("norm") containing numeric - at least to what I can see - normalized values of the following form:
When I am executing
k <- kmeans(norm,center=3)
I am receving t...
Businesswoman asked 7/4, 2016 at 7:40
4
I am running k-means clustering on a dataset with around 1 million items and around 100 attributes. I applied clustering for various k, and I want to evaluate the different groupings with the silho...
Cyrilcyrill asked 15/5, 2014 at 19:41
5
Solved
I have the following problem - made abstract to bring out the key issues.
I have 10 points each which is some distance from the other. I want to
be able to find the center of the cluster i.e. t...
Fini asked 10/8, 2009 at 8:52
7
Solved
So let's say I have an array like this:
[1,1,2,3,10,11,13,67,71]
Is there a convenient way to partition the array into something like this?
[[1,1,2,3],[10,11,13],[67,71]]
I looked through sim...
Maidy asked 16/7, 2012 at 22:25
2
Solved
How can I run hierarchical clustering on a correlation matrix in scipy/numpy? I have a matrix of 100 rows by 9 columns, and I'd like to hierarchically cluster by correlations of each entry across t...
Catenoid asked 25/5, 2010 at 19:39
2
I have implemented a function to find the nearest data point to each centroid calculated after running the K-Means clustering algorithm. I wanted to know if there's a sklearn function that allows m...
Miculek asked 24/1, 2018 at 0:56
7
I am using the sklearn.cluster KMeans package. Once I finish the clustering if I need to know which values were grouped together how can I do it?
Say I had 100 data points and KMeans gave me 5 clus...
Alten asked 24/3, 2016 at 7:56
11
Is it possible to specify your own distance function using scikit-learn K-Means Clustering?
Business asked 3/4, 2011 at 12:39
5
I have a bunch of sentences and I want to cluster them using scikit-learn spectral clustering. I've run the code and get the results with no problem. But, every time I run it I get different result...
Islam asked 18/9, 2014 at 20:28
5
Solved
I am trying to see if the performance of both can be compared based on the objective functions they work on?
Gleeman asked 27/2, 2010 at 1:37
5
Solved
In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters.
I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting ...
Eclosion asked 14/1, 2014 at 10:23
3
I'm trying to cluster some text documents using scikit-learn. I'm trying out both DBSCAN and MeanShift and want to determine which hyperparameters (e.g. bandwidth for MeanShift and eps for DBSCAN) ...
Huckster asked 2/9, 2014 at 22:27
1
I noticed that if I change all the edge weights in the graph with the same value, community.best_partition doesn't always result in the same communities.
I used the same random state in all cases a...
Fortunia asked 19/8, 2019 at 16:44
5
Solved
Is it possible to use GridSearchCV without cross validation? I am trying to optimize the number of clusters in KMeans clustering via grid search, and thus I don't need or want cross validation.
T...
Metempsychosis asked 19/6, 2017 at 17:15
3
Solved
I am trying to apply Gower distance implementation to my data frame. While it was smoothly working with the same dataset with more features, this time it gives an error when I call the Gower distan...
Redeem asked 31/5, 2018 at 13:53
6
Solved
I've found several examples on how to create these exact hierarchies (at least I believe they are) like the following here stackoverflow.com/questions/2982929/ which work great, and almost perform ...
Maomaoism asked 23/2, 2011 at 9:28
9
Solved
I am using DBSCAN to cluster some data using Scikit-Learn (Python 2.7):
from sklearn.cluster import DBSCAN
dbscan = DBSCAN(random_state=0)
dbscan.fit(X)
However, I found that there was no built-...
Marry asked 7/1, 2015 at 15:27
3
Solved
As far as I know, there is no package available for Rand Index in python while for Adjusted Rand Index you have the option of using sklearn.metrics.adjusted_rand_score(labels_true, labels_pred).
...
Behm asked 31/3, 2018 at 10:28
0
I am working with the R programming language.
Suppose there are 100 people - each person is denoted with an ID from 1:100. Each person can be friends with other people. The dataset can be represent...
Sylvanus asked 29/12, 2022 at 2:59
2
I want to use hierarchical cluster analysis to get the optimal number (K) of clusters automatically, then apply this K to K-means clustering in python.
After studying many article, I know some me...
Overawe asked 5/6, 2018 at 8:10
3
Solved
I have example data as follows:
library(data.table)
sample <- fread("
1,0,2,NA,cat X, type 1
3,4,3,1,cat X, type 2
1,0,2,2,cat X, type 3
3,4,3,0,cat X, type 4
1,0,2,NA,cat Y, type 1
3,4,3,N...
Russellrusset asked 29/9, 2022 at 10:59
7
Solved
I have a large set of vectors in 3 dimensions. I need to cluster these based on Euclidean distance such that all the vectors in any particular cluster have a Euclidean distance between each other l...
Libertinage asked 13/4, 2012 at 6:54
2
Solved
By my understanding of DBSCAN, it's possible for you to specify an epsilon of, say, 100 meters and — because DBSCAN takes into account density-reachability and not direct density-reachability when ...
Copp asked 31/8, 2013 at 10:29
5
Solved
I'm trying to calculate the Davies-Bouldin Index in Python.
Here are the steps the code below tries to reproduce.
5 Steps:
For each cluster, compute euclidean distances between each point to the c...
Trophozoite asked 30/12, 2017 at 18:8
1 Next >
© 2022 - 2024 — McMap. All rights reserved.