bad result when using precomputed chi2 kernel with libsvm (matlab)

Asked 24/8, 2011 at 15:5 Answered 23/10, 2011 at 1:12

I am trying libsvm and I follow the example for training a svm on the heart_scale data which comes with the software. I want to use a chi2 kernel which I precompute myself. The classification rate on the training data drops to 24%. I am sure I compute the kernel correctly but I guess I must be doing something wrong. The code is below. Can you see any mistakes? Help would be greatly appreciated.

%read in the data:
[heart_scale_label, heart_scale_inst] = libsvmread('heart_scale');
train_data = heart_scale_inst(1:150,:);
train_label = heart_scale_label(1:150,:);

%read somewhere that the kernel should not be sparse
ttrain = full(train_data)';
ttest = full(test_data)';

precKernel = chi2_custom(ttrain', ttrain');
model_precomputed = svmtrain2(train_label, [(1:150)', precKernel], '-t 4');

This is how the kernel is precomputed:

function res=chi2_custom(x,y)
a=size(x);
b=size(y);
res = zeros(a(1,1), b(1,1));
for i=1:a(1,1)
    for j=1:b(1,1)
        resHelper = chi2_ireneHelper(x(i,:), y(j,:));
        res(i,j) = resHelper;
    end
end
function resHelper = chi2_ireneHelper(x,y)
a=(x-y).^2;
b=(x+y);
resHelper = sum(a./(b + eps));

With a different svm implementation (vlfeat) I obtain a classification rate on the training data (yes, I tested on the training data, just to see what is going on) around 90%. So I am pretty sure the libsvm result is wrong.

Ancheta answered 24/8, 2011 at 15:5 Comment(0)

When working with support vector machines, it is very important to normalize the dataset as a pre-processing step. Normalization puts the attributes on the same scale and prevents attributes with large values from biasing the result. It also improves numerical stability (minimizes the likelihood of overflows and underflows due to floating-point representation).

Also to be exact, your calculation of the Chi-squared kernel is slightly off. Instead take the definition below, and use this faster implementation for it:

chi_squared_kernel

function D = chi2Kernel(X,Y)
    D = zeros(size(X,1),size(Y,1));
    for i=1:size(Y,1)
        d = bsxfun(@minus, X, Y(i,:));
        s = bsxfun(@plus, X, Y(i,:));
        D(:,i) = sum(d.^2 ./ (s/2+eps), 2);
    end
    D = 1 - D;
end

Now consider the following example using the same dataset as you (code adapted from a previous answer of mine):

%# read dataset
[label,data] = libsvmread('./heart_scale');
data = full(data);      %# sparse to full

%# normalize data to [0,1] range
mn = min(data,[],1); mx = max(data,[],1);
data = bsxfun(@rdivide, bsxfun(@minus, data, mn), mx-mn);

%# split into train/test datasets
trainData = data(1:150,:);    testData = data(151:270,:);
trainLabel = label(1:150,:);  testLabel = label(151:270,:);
numTrain = size(trainData,1); numTest = size(testData,1);

%# compute kernel matrices between every pairs of (train,train) and
%# (test,train) instances and include sample serial number as first column
K =  [ (1:numTrain)' , chi2Kernel(trainData,trainData) ];
KK = [ (1:numTest)'  , chi2Kernel(testData,trainData)  ];

%# view 'train vs. train' kernel matrix
figure, imagesc(K(:,2:end))
colormap(pink), colorbar

%# train model
model = svmtrain(trainLabel, K, '-t 4');

%# test on testing data
[predTestLabel, acc, decVals] = svmpredict(testLabel, KK, model);
cmTest = confusionmat(testLabel,predTestLabel)

%# test on training data
[predTrainLabel, acc, decVals] = svmpredict(trainLabel, K, model);
cmTrain = confusionmat(trainLabel,predTrainLabel)

The result on the testing data:

Accuracy = 84.1667% (101/120) (classification)
cmTest =
    62     8
    11    39

and on the training data, we get around 90% accuracy as you expected:

Accuracy = 92.6667% (139/150) (classification)
cmTrain =
    77     3
     8    62

train_train_kernel_matrix

Amphibious answered 23/10, 2011 at 1:12 Comment(1)

oh, cool - thats a detailed answer. Thanks for taking the time to think about my problem. It surely helped. – Ancheta 8/11, 2011 at 14:6

The problem is the following line:

resHelper = sum(a./(b + eps));

it should be:

resHelper = 1-sum(2*a./(b + eps));

Henslowe answered 1/9, 2011 at 8:15 Comment(2)

thanks for answering my question, I have just seen your response now. – Ancheta 13/10, 2011 at 12:59

@Sallos: although your formula was slightly off, the real problem is data normalization.. See my answer – Amphibious 23/10, 2011 at 1:16

Recommended topics

Hot tags