I'm contributing an answer as I experienced this problem when putting a fitted XGBRegressor model into production. Thus, this is a solution for cases where you cannot select column names from a y training or testing DataFrame, though there may be cross-over which could be helpful.
The model had been fit on a Pandas DataFrame, and I was attempting to pass a single row of values as a np.array to the predict function. Processing the values of the array had already been performed (reverse label encoded, etc.), and the array was all numeric values.
I got the familiar error:
ValueError: feature_names mismatch
followed by a list of the features, followed by a list of the same length: ['f0', 'f1' ....]
While there are no doubt more direct solutions, I had little time and this fixed the problem:
- Make the input vector a Pandas Dataframe:
series = {'feature1': [value],
'feature2': [value],
'feature3': [value],
'feature4': [value],
'feature5': [value],
'feature6': [value],
'feature7': [value],
'feature8': [value],
'feature9': [value],
'feature10': [value]
}
self.vector = pd.DataFrame(series)
- Get the feature names that the trained model knows:
names = model.get_booster().feature_names
- Select those feature from the input vector DataFrame (defined above), and perform iloc indexing:
result = model.predict(vector[names].iloc[[-1]])
The iloc transformation I found here.
Selecting the feature names – as models in the Scikit Learn implementation do not have a feature_names
attribute – using get_booster( ).feature_names
I found in @Athar post above.
Check out the the documentation to learn more.
Hope this helps.