What do maskers really do in SHAP package and fit them to train or test?

model = LogisticRegression(random_state = 1) model.fit(X_train, y_train) masker = shap.maskers.Independent(data = X_train) **or** masker = shap.maskers.Independent(data = X_test) explainer = shap.LinearExplainer(model, masker = masker) shap_val = explainer(X_test)```

Masker class provides a background data to "train" your explainer against. I.e., in:

explainer = shap.LinearExplainer(model, masker = masker)

you're using background data determined by masker (you may see what data is used by accessing masker.data attribute). You may read more about "true to model" or "true to data" explanations here or here.

Given above, calculation-wise you may do both:

masker = shap.maskers.Independent(data = X_train)

masker = shap.maskers.Independent(data = X_test)
explainer = shap.LinearExplainer(model, masker = masker)

but conceptually, imo the following makes more sense:

masker = shap.maskers.Independent(data = X_train)
explainer = shap.LinearExplainer(model, masker = masker)

This is akin usual train/test paradigm, where you train your model (and explainer) on train data, and try to predict (and explain) your test data.

Unrelated to the question. An alternative to masker, which samples data for you, would be to explicitly provide background that may allow comparing 2 datapoints: a point against which compare, and the point of interest, like in this notebook. In such a manner one may find out why 2 seemingly similar datapoints were classified differently.

Recommended topics

Hot tags