" The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated " when using nbclust
Asked Answered
A

1

7

I want to determine the best k for clustering using NbClust package.My data have both continuous and categorical variables so I use the dissimilarity matrix which has been calculated using daisy() from cluster package. I used the code bellow:

 res.nb <- NbClust(gower_dist_gender, min.nc = 1, 
              max.nc = 5,method = "complete", index ="all")

And come across this error:

The TSS matrix is indefinite. There must be too many missing values. The 
  index cannot be calculated.

What is the problem and how should I fix it? In addition consider that when I set the index to "silhouette", no problem was occured and give back the best k as 2. But I want to use index="all" to ensure the result of best k according to most of the indexes.(When the index is set to "all" 26 indexes are considered as index an the result shows the majority vote of indexes on the number of k). So the question is why running the code above which set the index as "all" comes across the error mentioned before?

Any little help would be greatly appreciated.

Abdul answered 6/9, 2017 at 5:42 Comment(0)
R
2

Your call to NbClust is wrong.

See the documentation on how to use a distance matrix instead of a data matrix:

data

matrix or dataset.

diss

dissimilarity matrix to be used. By default, diss=NULL, but if it is replaced by a dissimilarity matrix, distance should be "NULL".

distance

the distance measure to be used to compute the dissimilarity matrix. This must be one of: "euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski" or "NULL". By default, distance="euclidean". If the distance is "NULL", the dissimilarity matrix (diss) should be given by the user. If distance is not "NULL", the dissimilarity matrix should be "NULL".

Use data=NULL, distance=NULL and set diss instead.

Rounds answered 6/9, 2017 at 11:7 Comment(6)
I changed the code to : res.nb <- NbClust(diss=gower_dist_gender,distance=NULL, min.nc = 1, max.nc = 5,method = "complete", index ="all") and come across this error :Error in NbClust(diss = gower_dist_gender, distance = NULL, min.nc = 1, : Data matrix is needed. Only frey, mcclain, cindex, sihouette and dunn can be computed. Did you mean this change or s.th else?Abdul
Well, of course you can't compute all indexes now. Only those defined on diss.Rounds
:Thanks for your response but on "diss" argument, we set the dissimilarity matrix not the index. "index" argument could be set to "silhouette" , "all" , "ccc" , "gap" etc.Abdul
I'm referring to which validity indexes are mathematically defined using the dissimilarity matrix. Because you can only use these, for obvious reasons.Rounds
So which indexes are mathematically ok when using dissimiliraty matrix?just Silhouette?So should I rely on the result of just one index(Silhouette) or should I try s.th else too to ensure the result?Abdul
See the documentation on the indexes / their definition, to see which are compatible.Rounds

© 2022 - 2024 — McMap. All rights reserved.