cluster-analysis - 7

1

Inefficiency of topic modelling for text clustering

I tried doing text clustering using LDA, but it isn't giving me distinct clusters. Below is my code #Import libraries from gensim import corpora, models import pandas as pd from gensim.parsing.pre...

python cluster-analysis gensim lda

Ripping asked 20/3, 2018 at 9:17

4

Solved

how to choose initial centroids for k-means clustering

I am working on implementing k-means clustering in Python. What is the good way to choose initial centroids for a data set? For instance: I have following data set: A,1,1 B,2,1 C,4,4 D,4,5 I nee...

python cluster-analysis data-mining k-means centroid

Berliner asked 12/3, 2016 at 0:15

2

How to estimate eps using knn distance plot in DBSCAN

I have the following code to estimate the eps for DBSCAN. If the code is fine then I have obtained the knn distance plot. The code is : ns = 4 nbrs = NearestNeighbors(n_neighbors=ns).fit(data) dis...

python image-processing cluster-analysis knn dbscan

Detrusion asked 28/12, 2017 at 15:41

1

Solved

methods logLikelihood and logPerplexity not available for Spark LDA, how to measure them? [closed]

I'm trying to get perplexity and log likelihood of a Spark LDA model (with Spark 2.1). The code below does not work (methods logLikelihood and logPerplexity not found) although I can save t...

apache-spark machine-learning pyspark cluster-analysis lda

Brasilein asked 22/1, 2018 at 14:9

7

Solved

Clustering given pairwise distances with unknown cluster number?

I have a set of objects {obj1, obj2, obj3, ..., objn}. I have calculated the pairwise distances of all possible pairs. The distances are stored in a n*n matrix M, with Mij being the distance betwee...

algorithm machine-learning cluster-analysis

Glorianna asked 20/9, 2013 at 4:59

4

Solved

How to get flat clustering corresponding to color clusters in the dendrogram created by scipy

Using the code posted here, I created a nice hierarchical clustering: Let's say the the dendrogram on the left was created by doing something like Y = sch.linkage(D, method='average') # D is a ...

python cluster-analysis scipy hierarchical hierarchical-clustering

Cline asked 5/10, 2011 at 16:51

1

How to do clustering using the matrix of correlation coefficients?

I have a correlation coefficient matrix (n*n). How to do clustering using the correlation coefficient matrix? Can I use linkage and fcluster function in SciPy? Linkage function needs n * m matrix...

python scipy cluster-analysis correlation linkage

Infarction asked 28/6, 2016 at 8:4

2

Solved

Confusion matrix for Clustering in scikit-learn

I have a set of data with known labels. I want to try clustering and see if I can get the same clusters given by known labels. To measure the accuracy, I need to get something like a confusion matr...

python scikit-learn cluster-analysis confusion-matrix scikits

Evensong asked 8/12, 2017 at 6:25

2

Solved

DBSCAN for clustering data by location and density

I'm using the method dbscan::dbscan in order to cluster my data by location and density. My data looks like this: str(data) 'data.frame': 4872 obs. of 3 variables: $ price : num ... $ lat : num...

r machine-learning cluster-analysis data-mining dbscan

Clariceclarie asked 25/1, 2016 at 11:54

1

Solved

Is sklearn.cluster.KMeans sensative to data point order?

As noted in the answer to this post about feature scaling, some(all?) implementations of KMeans are sensitive to the order of features data points. Based on the sklearn.cluster.KMeans documentation...

python scikit-learn cluster-analysis k-means

Construct asked 2/12, 2017 at 5:12

1

How can I choose eps and minPts (two parameters for DBSCAN algorithm) for efficient results?

What routine or algorithm should I use to provide eps and minPts parameters to DBSCAN algorithm for efficient results?

python cluster-analysis dbscan

Scrawny asked 28/11, 2017 at 14:25

3

Clustering algorithm in R for missing categorical and numerical values

I want to perform marketing segmentation clustering on a dataset with missing categorical and numerical values in R. I cannot perform k-means clustering because of the missing values. R version 3...

r machine-learning cluster-analysis missing-data

Forelli asked 3/6, 2014 at 23:26

2

Solved

Extracting centroids using k-means clustering in python?

I have some data in a 1D array with shape [1000,] with 1000 elements in it. I applied k-means clustering on this data with 10 as number of clusters. After applying the k-means, I got cluster labels...

python arrays scikit-learn cluster-analysis k-means

Sheenasheeny asked 14/11, 2017 at 16:45

1

Solved

PySpark ML: Get KMeans cluster statistics

I have built a KMeansModel. My results are stored in a PySpark DataFrame called transformed. (a) How do I interpret the contents of transformed? (b) How do I create one or more Pandas DataFrame...

machine-learning pyspark cluster-analysis k-means apache-spark-ml

Karlykarlyn asked 6/11, 2017 at 5:30

0

How use visualize Gaussian mixture model's clusters for multi dimensional data in Scikit?

I have seen the Scikit-Learn example of Gaussian mixture for clustering. In this example (and other examples of this model), it looks the data always has two dimensions: plt.scatter(X[:, 0], X[:, ...

python scikit-learn cluster-analysis

Stepfather asked 3/11, 2017 at 16:13

5

Solved

Order of rows in heatmap?

Take the following code: heatmap(data.matrix(signals),col=colors,breaks=breaks,scale="none",Colv=NA,labRow=NA) How can I extract, pre-calculate or re-calculate the order of the rows in the heat...

r cluster-analysis heatmap

Peary asked 16/3, 2011 at 3:42

1

Solved

Python, Scikit-learn, K-means: What does the parameter n_init actually do? [duplicate]

I'm a beginner for Python. Now, I'm trying to understand what the parameter n_init from sklearn.cluster.KMeans From the documentation: n_init : int, default: 10 Number of time the k-means al...

python machine-learning scikit-learn cluster-analysis k-means

Lubricous asked 22/9, 2017 at 7:47

2

Solved

Get the cluster size in sklearn in python

I am using sklearn DBSCAN to cluster my data as follows. #Apply DBSCAN (sims == my data as list of lists) db1 = DBSCAN(min_samples=1, metric='precomputed').fit(sims) db1_labels = db1.labels_ db1n...

python machine-learning scikit-learn cluster-analysis dbscan

Overabound asked 11/9, 2017 at 12:17

1

" The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated " when using nbclust

I want to determine the best k for clustering using NbClust package.My data have both continuous and categorical variables so I use the dissimilarity matrix which has been calculated using daisy() ...

r cluster-analysis

Abdul asked 6/9, 2017 at 5:42

3

Solved

How Could One Implement the K-Means++ Algorithm?

I am having trouble fully understanding the K-Means++ algorithm. I am interested exactly how the first k centroids are picked, namely the initialization as the rest is like in the original K-Means ...

algorithm language-agnostic machine-learning cluster-analysis k-means

Bend asked 28/3, 2011 at 23:45

3

Some questions on dendrogram - python (Scipy)

I am new to scipy but I managed to get the expected dendrogram. I am some more questions; In the dendrogram, distance between some points are 0 but its not visible due to image border. How can I...

python scipy cluster-analysis dendrogram

Limited asked 14/3, 2012 at 19:15

1

Solved

What are noisy samples in Scikit's DBSCAN clustering algorithm?

If I apply Scikit's DBSCAN (http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html) on a similarity matrix, I get a series of labels back. Some of these labels are -1. The doc...

python scikit-learn cluster-analysis dbscan

Dominick asked 25/7, 2017 at 20:44

2

Solved

How can you compare two cluster groupings in terms of similarity or overlap in Python?

Simplified example of what I'm trying to do: Let's say I have 3 data points A, B, and C. I run KMeans clustering on this data and get 2 clusters [(A,B),(C)]. Then I run MeanShift clustering on thi...

python machine-learning cluster-analysis k-means similarity

Ctn asked 13/7, 2017 at 14:24

2

Solved

What is the relation between topic modeling and document clustering?

Topic modeling identifies distribution of topics in a document collection, which effectively identifies the clusters in the collection. So is it right to say that topic modeling is a technique to d...

cluster-analysis topic-modeling unsupervised-learning

Mussman asked 19/3, 2013 at 2:48

1

Solved

How to use ggplot to plot T-SNE clustering

Here is the t-SNE code using IRIS data: library(Rtsne) iris_unique <- unique(iris) # Remove duplicates iris_matrix <- as.matrix(iris_unique[,1:4]) set.seed(42) # Set a seed if you want repro...

r ggplot2 cluster-analysis tidyverse

Breckenridge asked 30/6, 2017 at 2:5

cluster-analysis Questions

Recommended topics

Hot tags