suppose I have a data set:
X y
20 0
22 0
24 1
27 0
30 1
40 1
20 0
...
I try to discretize X into few bins by minimizing the entropy. so I did the following:
clf = tree.DecisionTreeClassifier(criterion = 'entropy',max_depth = 4)
clf.fit(X.values.reshape(-1,1),y.values)
threshold = clf.tree_.threshold[clf.tree_.threshold>-2]
threshold = np.sort(threshold)
'threshold' should give the splitting points, is this a correct way of binning data?
any suggestions?
-2
? I also have the same problem – Guild