I'm looking to use lime's explainer
within a udf on pyspark. I've previously trained the tabular explainer, and stored is as a dill model as suggested in link
loaded_explainer = dill.load(open('location_to_explainer','rb'))
def lime_explainer(*cols):
selected_cols = np.array([value for value in cols])
exp = loaded_explainer.explain_instance(selected_cols, loaded_model.predict_proba, num_features = 10)
mapping = exp.as_map()[1]
return str(mapping)
This however takes a lot of time, as it appears a lot of the computation happens on the driver. I've then been trying to use spark broadcast to broadcast the explainer to the executors.
broadcasted_explainer= sc.broadcast(loaded_explainer)
def lime_explainer(*col):
selected_cols = np.array([value for value in cols])
exp = broadcasted_explainer.value.explain_instance(selected_cols, loaded_model.predict_proba, num_features = 10)
mapping = exp.as_map()[1]
return str(mapping)
However, I run into a pickling error, on broadcast.
PicklingError: Can't pickle at 0x7f69fd5680d0>: attribute lookup on lime.discretize failed
Can anybody help with this? Is there something like dill
that we can use instead of the cloudpickler used in spark?