cluster-analysis Questions

1

I tried doing text clustering using LDA, but it isn't giving me distinct clusters. Below is my code #Import libraries from gensim import corpora, models import pandas as pd from gensim.parsing.pre...
Ripping asked 20/3, 2018 at 9:17

4

Solved

I am working on implementing k-means clustering in Python. What is the good way to choose initial centroids for a data set? For instance: I have following data set: A,1,1 B,2,1 C,4,4 D,4,5 I nee...
Berliner asked 12/3, 2016 at 0:15

2

I have the following code to estimate the eps for DBSCAN. If the code is fine then I have obtained the knn distance plot. The code is : ns = 4 nbrs = NearestNeighbors(n_neighbors=ns).fit(data) dis...
Detrusion asked 28/12, 2017 at 15:41

1

Solved

I'm trying to get perplexity and log likelihood of a Spark LDA model (with Spark 2.1). The code below does not work (methods logLikelihood and logPerplexity not found) although I can save t...
Brasilein asked 22/1, 2018 at 14:9

7

Solved

I have a set of objects {obj1, obj2, obj3, ..., objn}. I have calculated the pairwise distances of all possible pairs. The distances are stored in a n*n matrix M, with Mij being the distance betwee...
Glorianna asked 20/9, 2013 at 4:59

4

Solved

Using the code posted here, I created a nice hierarchical clustering: Let's say the the dendrogram on the left was created by doing something like Y = sch.linkage(D, method='average') # D is a ...

1

I have a correlation coefficient matrix (n*n). How to do clustering using the correlation coefficient matrix? Can I use linkage and fcluster function in SciPy? Linkage function needs n * m matrix...
Infarction asked 28/6, 2016 at 8:4

2

Solved

I have a set of data with known labels. I want to try clustering and see if I can get the same clusters given by known labels. To measure the accuracy, I need to get something like a confusion matr...

2

Solved

I'm using the method dbscan::dbscan in order to cluster my data by location and density. My data looks like this: str(data) 'data.frame': 4872 obs. of 3 variables: $ price : num ... $ lat : num...
Clariceclarie asked 25/1, 2016 at 11:54

1

Solved

As noted in the answer to this post about feature scaling, some(all?) implementations of KMeans are sensitive to the order of features data points. Based on the sklearn.cluster.KMeans documentation...
Construct asked 2/12, 2017 at 5:12

1

What routine or algorithm should I use to provide eps and minPts parameters to DBSCAN algorithm for efficient results?
Scrawny asked 28/11, 2017 at 14:25

3

I want to perform marketing segmentation clustering on a dataset with missing categorical and numerical values in R. I cannot perform k-means clustering because of the missing values. R version 3...
Forelli asked 3/6, 2014 at 23:26

2

Solved

I have some data in a 1D array with shape [1000,] with 1000 elements in it. I applied k-means clustering on this data with 10 as number of clusters. After applying the k-means, I got cluster labels...
Sheenasheeny asked 14/11, 2017 at 16:45

1

Solved

I have built a KMeansModel. My results are stored in a PySpark DataFrame called transformed. (a) How do I interpret the contents of transformed? (b) How do I create one or more Pandas DataFrame...

0

I have seen the Scikit-Learn example of Gaussian mixture for clustering. In this example (and other examples of this model), it looks the data always has two dimensions: plt.scatter(X[:, 0], X[:, ...
Stepfather asked 3/11, 2017 at 16:13

5

Solved

Take the following code: heatmap(data.matrix(signals),col=colors,breaks=breaks,scale="none",Colv=NA,labRow=NA) How can I extract, pre-calculate or re-calculate the order of the rows in the heat...
Peary asked 16/3, 2011 at 3:42

1

Solved

I'm a beginner for Python. Now, I'm trying to understand what the parameter n_init from sklearn.cluster.KMeans From the documentation: n_init : int, default: 10 Number of time the k-means al...

2

Solved

I am using sklearn DBSCAN to cluster my data as follows. #Apply DBSCAN (sims == my data as list of lists) db1 = DBSCAN(min_samples=1, metric='precomputed').fit(sims) db1_labels = db1.labels_ db1n...
Overabound asked 11/9, 2017 at 12:17

1

I want to determine the best k for clustering using NbClust package.My data have both continuous and categorical variables so I use the dissimilarity matrix which has been calculated using daisy() ...
Abdul asked 6/9, 2017 at 5:42

3

Solved

I am having trouble fully understanding the K-Means++ algorithm. I am interested exactly how the first k centroids are picked, namely the initialization as the rest is like in the original K-Means ...

3

I am new to scipy but I managed to get the expected dendrogram. I am some more questions; In the dendrogram, distance between some points are 0 but its not visible due to image border. How can I...
Limited asked 14/3, 2012 at 19:15

1

Solved

If I apply Scikit's DBSCAN (http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) on a similarity matrix, I get a series of labels back. Some of these labels are -1. The doc...
Dominick asked 25/7, 2017 at 20:44

2

Solved

Simplified example of what I'm trying to do: Let's say I have 3 data points A, B, and C. I run KMeans clustering on this data and get 2 clusters [(A,B),(C)]. Then I run MeanShift clustering on thi...

2

Solved

Topic modeling identifies distribution of topics in a document collection, which effectively identifies the clusters in the collection. So is it right to say that topic modeling is a technique to d...
Mussman asked 19/3, 2013 at 2:48

1

Solved

Here is the t-SNE code using IRIS data: library(Rtsne) iris_unique <- unique(iris) # Remove duplicates iris_matrix <- as.matrix(iris_unique[,1:4]) set.seed(42) # Set a seed if you want repro...
Breckenridge asked 30/6, 2017 at 2:5

© 2022 - 2024 — McMap. All rights reserved.