I am attempting to implement a Naive Bayes classifier using BNT and MATLAB. So far I have been sticking with simple tabular_CPD
variables and "guesstimating" probabilities for the variables. My prototype net so far consists of the following:
DAG = false(5);
DAG(1, 2:5) = true;
bnet = mk_bnet(DAG, [2 3 4 3 3]);
bnet.CPD{1} = tabular_CPD(bnet, 1, [.5 .5]);
bnet.CPD{2} = tabular_CPD(bnet, 2, [.1 .345 .45 .355 .45 .3]);
bnet.CPD{3} = tabular_CPD(bnet, 3, [.2 .02 .59 .2 .2 .39 .01 .39]);
bnet.CPD{4} = tabular_CPD(bnet, 4, [.4 .33333 .5 .33333 .1 .33333]);
bnet.CPD{5} = tabular_CPD(bnet, 5, [.5 .33333 .4 .33333 .1 .33333]);
engine = jtree_inf_engine(bnet);
Here variable 1 is my desired output variable, set to initially assign a .5 probability to either output class.
Variables 2-5 define CPDs for features I measure:
- 2 is a cluster size, ranging from 1 to a dozen or more
- 3 is a ratio that will be a real value >= 1
- 4 and 5 are standard deviation (real) values (X and Y scatter)
In order to classify a candidate cluster I break all of the feature measurements into 3-4 range brackets, like so:
...
evidence = cell(1, 5);
evidence{2} = sum(M > [0 2 6]);
evidence{3} = sum(O > [0 1.57 2 3]);
evidence{4} = sum(S(1) > [-Inf 1 2]);
evidence{5} = sum(S(2) > [-Inf 0.4 0.8]);
eng = enter_evidence(engine, evidence);
marginals = marginal_nodes(eng, 1);
e = marginals.T(1);
...
This actually works pretty well, considering I'm only guessing at range brackets and probability values. But I believe that what I should be using here is a gaussian_CPD
. I think that a gaussian_CPD
can learn both the optimal brackets and probabilities (as mean and covariance matrices and weights).
My problem is, I am not finding any simple examples of how the BNT gaussian_CPD
class is used. How, for example, would I go about initializing a gaussian_CPD
to approximately the same behavior as one of my tabular_CPD
variables above?