Interpreting XGB feature importance and SHAP values
Asked Answered
G

1

5

For a particular prediction problem, I observed that a certain variable ranks high in the XGBoost feature importance that gets generated (on the basis of Gain) while it ranks quite low in the SHAP output.

How to interpret this? As in, is the variable highly important or not that important for our prediction problem?

Gangue answered 15/6, 2022 at 6:0 Comment(0)
D
11

Impurity-based importances (such as sklearn and xgboost built-in routines) summarize the overall usage of a feature by the tree nodes. This naturally gives more weight to high cardinality features (more feature values yield more possible splits), while gain may be affected by tree structure (node order matters even though predictions may be same). There may be lots of splits with little effect on the prediction or the other way round (many splits diluting the average importance) - see https://towardsdatascience.com/interpretable-machine-learning-with-xgboost-9ec80d148d27 and https://www.actuaries.digital/2019/06/18/analytics-snippet-feature-importance-and-the-shap-approach-to-machine-learning-models/ for various mismatch examples.

In an oversimplified way:

  • impurity-base importance explains the feature usage for generalizing on the train set;
  • permutation importance explains the contribution of a feature to the model accuracy;
  • SHAP explains how much would changing a feature value affect the prediction (not necessarily correct).
Dastardly answered 15/6, 2022 at 9:15 Comment(5)
Upvoted. Can you explain in more detail why a high cardinality feature may get a low shap value but be used by a tree model oftentimes (adding details how XGBoost calculates feature importances may be beneficial as well)Philbrook
XGBoost built-in routine has several modes available, using e.g. weight (amount of tree splits using a feature) or gain (impurity decrease), average or total, often showing very different results. More feature values yield more possible splits, thus larger weight, and gain will be affected by tree structure even though predictions may be same (see towardsdatascience.com/… and actuaries.digital/2019/06/18/… for various mismatch examples)Dastardly
Thanks. Would you wish to update your answer with this knowledge?Philbrook
Could you explain intuitively what does it mean that a feature A is ranked higher than the feature B in a shap plot? In your anwer, do you mean that A has an average contribution (of all datapoints) to the final output higher than B if A is ranked higher? How could you interpret that if the gain of A is less than the gain of B in the feature importance method of xgboost?Chewink
I think you got the SHAP explanation mixed up. SHAP doesn't change feature values.Franco

© 2022 - 2024 — McMap. All rights reserved.