cluster-analysis Questions

6

Solved

UPDATED: In the end, the solution I opted to use for clustering my large dataset was one suggested by Anony-Mousse below. That is, using ELKI's DBSCAN implimentation to do my clustering rather than...

6

Solved

I have a data table ("norm") containing numeric - at least to what I can see - normalized values of the following form: When I am executing k <- kmeans(norm,center=3) I am receving t...
Businesswoman asked 7/4, 2016 at 7:40

4

I am running k-means clustering on a dataset with around 1 million items and around 100 attributes. I applied clustering for various k, and I want to evaluate the different groupings with the silho...
Cyrilcyrill asked 15/5, 2014 at 19:41

5

Solved

I have the following problem - made abstract to bring out the key issues. I have 10 points each which is some distance from the other. I want to be able to find the center of the cluster i.e. t...
Fini asked 10/8, 2009 at 8:52

7

Solved

So let's say I have an array like this: [1,1,2,3,10,11,13,67,71] Is there a convenient way to partition the array into something like this? [[1,1,2,3],[10,11,13],[67,71]] I looked through sim...

2

Solved

How can I run hierarchical clustering on a correlation matrix in scipy/numpy? I have a matrix of 100 rows by 9 columns, and I'd like to hierarchically cluster by correlations of each entry across t...
Catenoid asked 25/5, 2010 at 19:39

2

I have implemented a function to find the nearest data point to each centroid calculated after running the K-Means clustering algorithm. I wanted to know if there's a sklearn function that allows m...
Miculek asked 24/1, 2018 at 0:56

7

I am using the sklearn.cluster KMeans package. Once I finish the clustering if I need to know which values were grouped together how can I do it? Say I had 100 data points and KMeans gave me 5 clus...
Alten asked 24/3, 2016 at 7:56

11

Is it possible to specify your own distance function using scikit-learn K-Means Clustering?

5

I have a bunch of sentences and I want to cluster them using scikit-learn spectral clustering. I've run the code and get the results with no problem. But, every time I run it I get different result...

5

Solved

I am trying to see if the performance of both can be compared based on the objective functions they work on?
Gleeman asked 27/2, 2010 at 1:37

5

Solved

In my program, i'm taking k=2 for k-mean algorithm i.e i want only 2 clusters. I have implemented in a very simple and straightforward way, still i'm unable to understand why my program is getting ...
Eclosion asked 14/1, 2014 at 10:23

3

I'm trying to cluster some text documents using scikit-learn. I'm trying out both DBSCAN and MeanShift and want to determine which hyperparameters (e.g. bandwidth for MeanShift and eps for DBSCAN) ...
Huckster asked 2/9, 2014 at 22:27

1

I noticed that if I change all the edge weights in the graph with the same value, community.best_partition doesn't always result in the same communities. I used the same random state in all cases a...
Fortunia asked 19/8, 2019 at 16:44

5

Solved

Is it possible to use GridSearchCV without cross validation? I am trying to optimize the number of clusters in KMeans clustering via grid search, and thus I don't need or want cross validation. T...
Metempsychosis asked 19/6, 2017 at 17:15

3

Solved

I am trying to apply Gower distance implementation to my data frame. While it was smoothly working with the same dataset with more features, this time it gives an error when I call the Gower distan...
Redeem asked 31/5, 2018 at 13:53

6

Solved

I've found several examples on how to create these exact hierarchies (at least I believe they are) like the following here stackoverflow.com/questions/2982929/ which work great, and almost perform ...
Maomaoism asked 23/2, 2011 at 9:28

9

Solved

I am using DBSCAN to cluster some data using Scikit-Learn (Python 2.7): from sklearn.cluster import DBSCAN dbscan = DBSCAN(random_state=0) dbscan.fit(X) However, I found that there was no built-...

3

Solved

As far as I know, there is no package available for Rand Index in python while for Adjusted Rand Index you have the option of using sklearn.metrics.adjusted_rand_score(labels_true, labels_pred). ...

0

I am working with the R programming language. Suppose there are 100 people - each person is denoted with an ID from 1:100. Each person can be friends with other people. The dataset can be represent...
Sylvanus asked 29/12, 2022 at 2:59

2

I want to use hierarchical cluster analysis to get the optimal number (K) of clusters automatically, then apply this K to K-means clustering in python. After studying many article, I know some me...
Overawe asked 5/6, 2018 at 8:10

3

Solved

I have example data as follows: library(data.table) sample <- fread(" 1,0,2,NA,cat X, type 1 3,4,3,1,cat X, type 2 1,0,2,2,cat X, type 3 3,4,3,0,cat X, type 4 1,0,2,NA,cat Y, type 1 3,4,3,N...
Russellrusset asked 29/9, 2022 at 10:59

7

Solved

I have a large set of vectors in 3 dimensions. I need to cluster these based on Euclidean distance such that all the vectors in any particular cluster have a Euclidean distance between each other l...

2

Solved

By my understanding of DBSCAN, it's possible for you to specify an epsilon of, say, 100 meters and — because DBSCAN takes into account density-reachability and not direct density-reachability when ...
Copp asked 31/8, 2013 at 10:29

5

Solved

I'm trying to calculate the Davies-Bouldin Index in Python. Here are the steps the code below tries to reproduce. 5 Steps: For each cluster, compute euclidean distances between each point to the c...
Trophozoite asked 30/12, 2017 at 18:8

© 2022 - 2025 — McMap. All rights reserved.