Why are all labels_ are -1? Generated by DBSCAN in Python

![enter image description here][1]

from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=0.001, min_samples=10) 
clustering = dbscan.fit(X)

Example vectors：

array([[ 0.05811029, -1.089355  , -1.9143777 , ...,  1.235167  ,
    -0.6473859 ,  1.5684978 ],
   [-0.7117326 , -0.31876346, -0.45949244, ...,  0.17786546,
     1.9377285 ,  2.190525  ],
   [ 1.1685177 , -0.18201494,  0.19475089, ...,  0.7026453 ,
     0.3937522 , -0.78675956],
   ...,
   [ 1.4172379 ,  0.01070347, -1.3984257 , ..., -0.70529956,
     0.19471683, -0.6201791 ],
   [ 0.6171041 , -0.8058429 ,  0.44837445, ...,  1.216958  ,
    -0.10003573, -0.19012968],
   [ 0.6433722 ,  1.1571665 , -1.2123466 , ...,  0.592805  ,
     0.23889546,  1.6207514 ]], dtype=float32)

X is model.wv.vectors, generated from model = word2vec.Word2Vec(sent, min_count=1,size= 50,workers=3, window =3, sg = 1)

Results are as follows:

array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1])

Based on the docs:

labels_array, shape = [n_samples]

Cluster labels for each point in the dataset given to fit(). Noisy samples are given the label -1.

The answer to this you can find here: What are noisy samples in Scikit's DBSCAN clustering algorithm?

Shortword: These are not exactly part of a cluster. They are simply points that do not belong to any clusters and can be "ignored" to some extent. It seems that you have really different data, which does not have central clustering classes.

What you can try?

DBSCAN(eps=0.5, min_samples=5, metric='euclidean', metric_params=None, algorithm='auto', leaf_size=30, p=None, n_jobs=None)

You can play with the parameters or change the clustering algorithm? Did you try kmeans?

Recommended topics

Hot tags