cluster one-dimensional data using pvclust
Asked Answered
C

1

1

Thanks for taking time read this question. I have some one-dimensional data to cluster in R. The basic hclust command works fine. But the pvclust command, however, does not take one-dimensional data, and keeps saying:

Error in hclust(distance, method = method.hclust) : 
  must have n >= 2 objects to cluster

I found a work-around, that I added some all-zero rows to the data. So the data becomes:

       [,1]   [,2]   [,3]  [,4]  [,5]   [,6]   [,7]   [,8]   [,9]  [,10]
[1,]  7.424 14.251 15.957 1.542 2.451 20.836 13.534 20.003 12.555 10.817
[2,]      0      0      0     0     0      0      0      0      0      0
[3,]      0      0      0     0     0      0      0      0      0      0
[4,]      0      0      0     0     0      0      0      0      0      0

Then I ran pvclust, and it worked!

But I am concerned that this work-around screws up the mathematics laying behind pvclust. Can any one tell me whether I am right/wrong, and if there's a better solution to my question?

Thank you!

Closed answered 20/5, 2013 at 22:16 Comment(0)
S
2

First of all, let me state that none of these methods is meant for one-dimensional data.

For one-dimensional data, please use a method that exploits that the data can be sorted. For example, use a method based on kernel density estimation.

The term "cluster analysis" is usually used with multidimensional data only. In one dimensional, there are much better methods. See also "natural breaks optimization", but IMHO you should be using kernel density estimation: split the data at local minima in the KDE.

Now to your actual question. Most likely the problem is that you are ... passing 1 dimensional data. Which is interpreted as one record, with d dimensions, and thus the method complains about having a single sample only. You may have success by first transposing your record.

With your hack of adding zero records, the result most likely becomes bogus. You are probably clustering a data set that has 1 vector that contains your data, and 3 vectors that are all zero...

But in the end, you should not be using these methods here anyway! Use a method that exploits that your data can be sorted.

Speechless answered 21/5, 2013 at 12:8 Comment(2)
By the way, do you also know the answer to this question: suppose I used kernel density estimation, and make a cut a the local minima. Then can I test how significant the left peak is separated from the right peak? Thanks!Closed
I'm not using KDE a lot - I work with multidimensional data only. So I don't know references for suitable significance tests. You need to consider what your Hypothesis is: uniform distribution? Then you could probably use K-S testing. For such questions, have a look at the stats.stackexchange.com sister site, it's much more appropriate for statistics questions.Speechless

© 2022 - 2024 — McMap. All rights reserved.