Preprocessing the training data (such as centering or scaling) before training an XGBoost model, can lead to a loss of feature names. Most answers on SO suggest training the model in such a way that feature names aren't lost (such as using pd.get_dummies on data frame columns).
I have trained an XGBoost model using the preprocessed data (centre and scale using MinMaxScaler). Thereby, I am in a similar situation where feature names are lost.
For instance:
scaler = MinMaxScaler(feature_range=(0, 1))
X = scaler.fit_transform(X)
my_model_name = XGBClassifier()
my_model_name.fit(X,Y)`
where X and Y are the training data and labels respectively. The scaling above returns a 2D NumPy array, thereby discarding feature names from pandas DataFrame.
Thus, when I try to use plot_importance(my_model_name)
, it leads to the plot of feature importance, but only with feature names such as f0, f1, f2 etc., and not the actual feature names from the original data set.
Is there a way to map the feature names from the original training data to the feature importance plot generated, so that the original feature names are plotted in the graph? Any help in this regard is highly appreciated.