The SHAP values returned from tree explainer's .shap_values(some_data)
gives different dimensions/results for XGB as for random forest. I've tried looking into it, but can't seem to find why or how, or an explanation in any of Slundberg's (SHAP dude's) tutorials. So:
- Is there a reason for this that I am missing?
- Is there some flag that returns shap values fro XGB per class like for other models that is not obvious or that I am missing?
Below is some sample code!
import xgboost.sklearn as xgb
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import shap
bc = load_breast_cancer()
cancer_df = pd.DataFrame(bc['data'], columns=bc['feature_names'])
cancer_df['target'] = bc['target']
cancer_df = cancer_df.iloc[0:50, :]
target = cancer_df['target']
cancer_df.drop(['target'], inplace=True, axis=1)
X_train, X_test, y_train, y_test = train_test_split(cancer_df, target, test_size=0.33, random_state = 42)
xg = xgb.XGBClassifier()
xg.fit(X_train, y_train)
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
xg_pred = xg.predict(X_test)
rf_pred = rf.predict(X_test)
rf_explainer = shap.TreeExplainer(rf, X_train)
xg_explainer = shap.TreeExplainer(xg, X_train)
rf_vals = rf_explainer.shap_values(X_train)
xg_vals = xg_explainer.shap_values(X_train)
print('Random Forest')
print(type(rf_vals))
print(type(rf_vals[0]))
print(rf_vals[0].shape)
print(rf_vals[1].shape)
print('XGBoost')
print(type(xg_vals))
print(xg_vals.shape)
Output:
Random Forest
<class 'list'>
<class 'numpy.ndarray'>
(33, 30)
(33, 30)
XGBoost
<class 'numpy.ndarray'>
(33, 30)