Unseen nominal values in weka
Asked Answered
A

1

8

I have a dataset with some nominal values as features. The training set I have has a set of values for the nominal features which are absent in my test set. For instance my feature in the training set corresponds to

@attribute h4 {br,pl,com,ro,th,np}

and the same feature in the test set has

@attribute h4 {br,pl,abc,th,def,ghi,lmno}

I believe because of this, weka is not allowing me to re-evaluate the model I built on my training set on my test set. Is there a way around this? Am I missing something?

EDIT: I'm using a RandomForest classifier.

Thanks

Assistance answered 28/11, 2013 at 5:53 Comment(1)
You should be able to use the same attribute declarations in the train and test set. It's not a problem if not all declared values appear in the data.Holden
T
5

Weka seeks all the nominal values used in test set to be exist in training set too because the classifier should learn before making predictions.

Also Weka uses nominal values with their indices; thus, it is important to use same order for nominal values of the same attribute to get reliable results.

In your case, just use the same values -that covers all values- in the same order for both training set and test set.

Your combined values {br,pl,com,ro,th,np,abc,th,def,ghi,lmno} can be used for both training set and test set.

Transmission answered 3/12, 2013 at 0:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.