cluster-analysis Questions

2

Solved

I have two N x N co-occurrence matrices (484x484 and 1060x1060) that I have to analyze. The matrices are symmetrical along the diagonal and contain lots of zero values. The non-zero values are inte...
Jakob asked 4/6, 2017 at 18:51

3

Solved

I am using R software (R commander) to cluster my data. I have a smaller subset of my data containing 200 rows and about 800 columns. I am getting the following error when trying kmeans cluster and...
Colorist asked 16/4, 2011 at 13:54

1

Solved

I'm running some cluster analysis and I'm trying to figure out two main things: 1) How to best interpret the results of the p-values in pvclust (what is the null that they are establishing?) 2) H...
Enterovirus asked 23/4, 2017 at 20:34

1

I've had encouraging results clustering a set of entity names using scikit-learn's affinity propagation implementation, with a modified Jaro-Winkler distance as the similarity metric, but my cluste...
Denumerable asked 24/4, 2017 at 14:8

2

I am trying to work on android mobile app where I have a functionality to find matches according to interest and location. Many dating apps are already doing some kinda functionality for example Ti...
Tetragon asked 23/4, 2017 at 16:23

8

Solved

Expectation Maximization (EM) is a kind of probabilistic method to classify data. Please correct me if I am wrong if it is not a classifier. What is an intuitive explanation of this EM tech...

1

As per as the sklearn kmeans documentation, it says that k-means requires a matrix of shape=(n_samples, n_features). But I provided a distance matrix of shape=(n_samples,n_samples) where each index...
Decembrist asked 20/4, 2017 at 7:22

3

Solved

We're currently trying to detect the object regions in medical instruments images using the methods available in OpenCV, C++ version. An example image is shown below: Here are the steps we're fol...

1

Solved

A dendrogram is a data structure used with hierarchical clustering algorithms that groups clusters at different "heights" of a tree - where the heights correspond to distance measures between clust...

2

Solved

I want to cluster data with missing columns. Doing it manually I would calculate the distance in case of a missing column simply without this column. With scikit-learn, missing data is not possibl...
Vagary asked 24/2, 2016 at 19:39

8

Solved

I am looking for Python implementation of k-means algorithm with examples to cluster and cache my database of coordinates.
Fetal asked 9/10, 2009 at 19:16

1

Solved

I'd like to find a threshold value for a bimodal distribution. For example, a bimodal distribution could look like the following: import numpy as np import matplotlib.pyplot as plt np.random.seed(...
Gittern asked 10/2, 2017 at 1:22

3

I am a newbie in text mining, here is my situation. Suppose i have a list of words ['car', 'dog', 'puppy', 'vehicle'], i would like to cluster words into k groups, I want the output to be [['car', ...
Poteen asked 31/1, 2017 at 11:25

3

Solved

I'm reading the following article for my master thesis: http://graphics.cs.cmu.edu/projects/discriminativePatches/discriminativePatches.pdf In section 2.1 it said: "we turn the classification step...

0

I know how to group rows (genes) by annotation category using pheatmap, and I know how to perform Person's correlation clustering on the whole set of rows (genes), but what I would like to accompli...
Magdalenmagdalena asked 14/1, 2017 at 8:56

1

Solved

I have three matrices to compare. Each of them is 5x6. I originally wanted to use hierarchical clustering to cluster the matrices, such that the most similar matrices are grouped, given a threshold...
Microdont asked 1/1, 2017 at 15:19

1

I know how the algorithm works, but I'm not sure how it determines the clusters. Based on images I guess that it sees all the neurons that are connected by edges as one cluster. So that you might h...

3

Solved

I am doing some clustering using K-means in MATLAB. As you might know the usage is as below: [IDX,C] = kmeans(X,k) where IDX gives the cluster number for each data point in X, and C gives the ce...
Ashe asked 9/12, 2010 at 15:43

2

Solved

I'd like to understand the parameter max_iter from the class sklearn.cluster.KMeans. According to the documentation: max_iter : int, default: 300 Maximum number of iterations of the k-means algor...
Masera asked 1/12, 2016 at 10:10

1

Assume, I have hourly data corresponding to 5 categories for consective 10 days, created as: library(xts) set.seed(123) timestamp <- seq(as.POSIXct("2016-10-01"),as.POSIXct("2016-10-10 23:59:59...
Lalonde asked 19/11, 2016 at 5:7

2

I have been trying to cluster multiple datasets of URLs (around 1 million each), to find the original and the typos of each URL. I decided to use the levenshtein distance as a similarity metric, al...

2

Solved

I was trying to use DBSCAN algorithm from scikit-learn library with cosine metric but was stuck with the error. The line of code is db = DBSCAN(eps=1, min_samples=2, metric='cosine').fit(X) wh...

8

I have need to do some cluster analysis on a set of 2 dimensional data (I may add extra dimensions along the way). The analysis itself will form part of the data being fed into a visualisation, r...

3

Solved

I am trying to run the package NbClust on my data (100 rows x 130 columns) to determine the number of clusters I should choose, but I keep getting this error if I try to apply it to the full data s...
Amalee asked 18/12, 2013 at 22:16

1

I would like to add to a clusplot plot the variables used for pca as arrows. I am not sure that a way has been implemented (I can't find anything in the documentation). I have produced a clusplot ...
Broeker asked 21/4, 2015 at 11:11

© 2022 - 2024 — McMap. All rights reserved.