How to rank features by their importance in a Weka classifier?
Asked Answered
I

1

6

I use Weka to successfully build a classifier. I would now like to evaluate how effective or important my features are. Fot this I use AttributeSelection. But I don't know how to ouput the different features with their corresponding importance. I want simply list the features in decreasing order of their information gain scores!

Interne answered 21/1, 2014 at 20:5 Comment(0)
C
12

There are many ways of scoring the features, which are called attributes, in Weka. These methods are available as subclasses of weka.attributeSelection.ASEvaluation.

Any of these evaluation classes will give you a score for each attribute. If you use information gain for scoring, for example, you will be using it the class InfoGainAttributeEval. The helpful methods are

  • InfoGainAttributeEval.html#buildEvaluator(), and
  • InfoGainAttributeEval.html#evaluateAttribute()

The other types of feature scoring (gain ratio, correlation, etc.) have the same methods for scoring. Using any of these, you can rank all your features.

The ranking itself is independent of Weka. Of the many ways of doing it, this is one:

Map<Attribute, Double> infogainscores = new HashMap<Attribute, Double>();
for (int i = 0; i < instances.numAttributes(); i++) {
    Attribute t_attr = instaces.attribute(i);
    double infogain  = evaluation.evaluateAttribute(i);
    infogainscores.put(t_attr, infogain);
}

Now you have a map which needs to be sorted by value. Here's a generic code to do that:

 /**
  * Provides a {@code SortedSet} of {@code Map.Entry} objects. The sorting is in ascending order if {@param order} > 0
  * and descending order if {@param order} <= 0.
  * @param map   The map to be sorted.
  * @param order The sorting order (positive means ascending, non-positive means descending).
  * @param <K>   Keys.
  * @param <V>   Values need to be {@code Comparable}.
  * @return      A sorted set of {@code Map.Entry} objects.
  */
 static <K,V extends Comparable<? super V>> SortedSet<Map.Entry<K,V>>
 entriesSortedByValues(Map<K,V> map, final int order) {
     SortedSet<Map.Entry<K,V>> sortedEntries = new TreeSet<>(
         new Comparator<Map.Entry<K,V>>() {
             public int compare(Map.Entry<K,V> e1, Map.Entry<K,V> e2) {
                 return (order > 0) ? compareToRetainDuplicates(e1.getValue(), e2.getValue()) : compareToRetainDuplicates(e2.getValue(), e1.getValue());
         }
     }
    );
    sortedEntries.addAll(map.entrySet());
    return sortedEntries;
}

and finally,

private static <V extends Comparable<? super V>> int compareToRetainDuplicates(V v1, V v2) {
    return (v1.compareTo(v2) == -1) ? -1 : 1;
}

Now you have a list of entries sorted by values (in ascending or descending order, as you wish). Go crazy with it!

Please note that you should handle the case where more than one attribute has the same information gain. That is why I went through the process of sorting by values while retaining duplicates.

Communist answered 21/1, 2014 at 21:25 Comment(8)
thank you. I use AttributeSelection with InfoGainAttributeEval as attribute evaluator and Ranker as search method. But I don't know what method allows to select attributes with their corresponding relevance (or importance). I use them in a java program.Interne
What do you mean by "select attributes with their corresponding relevance"? An attribute is selected (or not) based on the information gain score. After that, the actual score may or may not play any role (depending on the classifier).Communist
I have six features used to classify data. I would like to know the importance of each feature. What attributes are the most relevant to classify data. I want evaluate this relevance for each attribute to compare them.Interne
As far as I understand, you are already doing everything to get the scores. Higher information gain means better discriminative power for classification. Simply list the features in decreasing order of info gain scores!Communist
This is what I want. How do list the features? Something like this Ranked attributes: 0.354 attr1 0.333 attr2 0.316 attr4 0.304 attr3Interne
Thank you @Chthonic Project, your answer is very helpful. I accept it.Interne
can I ask here another question? I used Svm as a classifier, so I what to know the important features in this Classifier that influences the result more that other features. the method you explain here that uses IG can give me this features? I mean that, does SVM use IG ? IF no and SVM uses other Technique for feature, how using IG can show us the important feature in SVM or other classifier?Cauterize
can you correct the line : Map<Attribute, Double> infogainscores = new HashMap<Integer, Double>(); to Map<Attribute, Double> infogainscores = new HashMap<Attribute, Double>(); small thing but still...Carsick

© 2022 - 2024 — McMap. All rights reserved.