Feature Importance with XGBClassifier

Asked 5/7, 2016 at 21:0 Answered 2/9, 2022 at 14:5

Hopefully I'm reading this wrong but in the XGBoost library documentation, there is note of extracting the feature importance attributes using feature_importances_ much like sklearn's random forest.

However, for some reason, I keep getting this error: AttributeError: 'XGBClassifier' object has no attribute 'feature_importances_'

My code snippet is below:

from sklearn import datasets
import xgboost as xg
iris = datasets.load_iris()
X = iris.data
Y = iris.target
Y = iris.target[ Y < 2] # arbitrarily removing class 2 so it can be 0 and 1
X = X[range(1,len(Y)+1)] # cutting the dataframe to match the rows in Y
xgb = xg.XGBClassifier()
fit = xgb.fit(X, Y)
fit.feature_importances_

It seems that you can compute feature importance using the Booster object by calling the get_fscore attribute. The only reason I'm using XGBClassifier over Booster is because it is able to be wrapped in a sklearn pipeline. Any thoughts on feature extractions? Is anyone else experiencing this?

Hedwig answered 5/7, 2016 at 21:0 Comment(8)

I can't reproduce the problem with your snippet. What version of XGBoost do you have? – Stillbirth 5/7, 2016 at 21:6

from my pip freeze , i have xgboost==0.4a30 – Hedwig 5/7, 2016 at 21:22

Does this help? kaggle.com/mmueller/… – Kirkuk 5/7, 2016 at 21:33

I have seen this before. The problem is however, is that the get_fscore method is bound to the Booster object rather than XGBClassifier from my understanding. See the doc here – Hedwig 5/7, 2016 at 21:36

I have 0.4 and your snippet works with no problem. – Stillbirth 6/7, 2016 at 1:56

Hrm this is odd. The current version is 0.4a30 right? It appears so looking at their repo – Hedwig 6/7, 2016 at 16:35

@MinhMai using feature_importances_ via booster() are you able to get the column names accurately ? In my case, it throws a KeyError that not certain features are not present in the data. – Glori 23/6, 2017 at 18:23

You can plot XGBClassifier feature importance with names directly: xgboosting.com/… – Methanol 17/5 at 20:29

As the comments indicate, I suspect your issue is a versioning one. However if you do not want to/can't update, then the following function should work for you.

def get_xgb_imp(xgb, feat_names):
    from numpy import array
    imp_vals = xgb.booster().get_fscore()
    imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
    total = array(imp_dict.values()).sum()
    return {k:v/total for k,v in imp_dict.items()}


>>> import numpy as np
>>> from xgboost import XGBClassifier
>>> 
>>> feat_names = ['var1','var2','var3','var4','var5']
>>> np.random.seed(1)
>>> X = np.random.rand(100,5)
>>> y = np.random.rand(100).round()
>>> xgb = XGBClassifier(n_estimators=10)
>>> xgb = xgb.fit(X,y)
>>> 
>>> get_xgb_imp(xgb,feat_names)
{'var5': 0.0, 'var4': 0.20408163265306123, 'var1': 0.34693877551020408, 'var3': 0.22448979591836735, 'var2': 0.22448979591836735}

Cyrilcyrill answered 6/7, 2016 at 15:22 Comment(6)

Interesting approach! However, would it matter if I tune my parameters for XGBClassifer? How would I ensure that it would match the parameters for Booster – Hedwig 6/7, 2016 at 16:33

you're referencing the booster() object within your XGBClassifer() object, so it will match: xgb.booster() – Cyrilcyrill 6/7, 2016 at 18:32

I realized something strange, and is that supposed to happen? The values returned from xgb.booster().get_fscore() that should contain values for all columns the model is trained for? Because I find 2 columns missing from imp_vals, which are present in train columns, but not as key in imp_cols – Feudist 22/12, 2016 at 13:7

I had to use xgb.get_booster().get_fscore(). Otherwise I was getting TypeError: 'str' object is not callable. I am using xgboost 0.6. – Elston 9/6, 2017 at 8:25

I pickled my XGB object and am unable to call get_booster():

File "/usr/local/lib/python3.5/dist-packages/xgboost/sklearn.py", line 193, in get_booster raise XGBoostError('need to call fit or load_model beforehand')

– Tourcoing 15/9, 2019 at 1:4

As for today, I have the following error:

TypeError                                 Traceback (most recent call last) Cell In[152], line 1 ----> 1 get_xgb_imp(xgb_model,columns_names)  Cell In[151], line 3, in get_xgb_imp(xgb, feat_names)       1 def get_xgb_imp(xgb, feat_names):       2     from numpy import array ----> 3     imp_vals = xgb.booster().get_fscore()       4     imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}       5     total = array(imp_dict.values()).sum() TypeError: 'NoneType' object is not callable

– Arcadia 16/1 at 13:40

For xgboost, if you use xgb.fit(),then you can use the following method to get feature importance.

import pandas as pd
xgb_model=xgb.fit(x,y)
xgb_fea_imp=pd.DataFrame(list(xgb_model.get_booster().get_fscore().items()),
columns=['feature','importance']).sort_values('importance', ascending=False)
print('',xgb_fea_imp)
xgb_fea_imp.to_csv('xgb_fea_imp.csv')

from xgboost import plot_importance
plot_importance(xgb_model, )

Palanquin answered 18/6, 2018 at 4:36 Comment(0)

I found out the answer. It appears that version 0.4a30 does not have feature_importance_ attribute. Therefore if you install the xgboost package using pip install xgboost you will be unable to conduct feature extraction from the XGBClassifier object, you can refer to @David's answer if you want a workaround.

However, what I did is build it from the source by cloning the repo and running . ./build.sh which will install version 0.4 where the feature_importance_ attribute works.

Hope this helps others!

Hedwig answered 9/7, 2016 at 0:32 Comment(0)

Get Feature Importance as a sorted data frame

import pandas as pd
import numpy as np
def get_xgb_imp(xgb, feat_names):
    imp_vals = xgb.booster().get_fscore()
    feats_imp = pd.DataFrame(imp_vals,index=np.arange(2)).T
    feats_imp.iloc[:,0]= feats_imp.index    
    feats_imp.columns=['feature','importance']
    feats_imp.sort_values('importance',inplace=True,ascending=False)
    feats_imp.reset_index(drop=True,inplace=True)
    return feats_imp

feature_importance_df = get_xgb_imp(xgb, feat_names)

Puritan answered 23/4, 2018 at 13:53 Comment(0)

For those having the same problem as Luís Bianchin, "TypeError: 'str' object is not callable", I found a solution (that works for me at least) here.

In short, I found modifying David's code from

imp_vals = xgb.booster().get_fscore()

imp_vals = xgb.get_fscore()

worked for me.

For more detail I would recommend visiting the link above.

Big thanks to David and ianozsvald

Serpigo answered 6/5, 2019 at 20:46 Comment(0)

You can also use the built-in plot_importance function:

from xgboost import XGBClassifier, plot_importance
fit = XGBClassifier().fit(X,Y)
plot_importance(fit)

Gripsack answered 12/8, 2020 at 7:15 Comment(0)

The alternative to built-in feature importance can be:

permutation-based importance from scikit-learn (permutation_importance method
importance with Shapley values (shap package)

I really like shap package because it provides additional plots. Example:

Importance Plot

Summary Plot

Dependence Plot

You can read about alternative ways to compute feature importance in Xgboost in this blog post of mine.

Piselli answered 17/8, 2020 at 12:8 Comment(0)

An update of the accepted answer since it no longer works:

def get_xgb_imp(xgb_model, feat_names):
    imp_vals = xgb_model.get_fscore()
    imp_dict = {feat: float(imp_vals.get(feat, 0.)) for feat in feat_names}
    total = sum(list(imp_dict.values()))
    return {k: round(v/total, 5) for k,v in imp_dict.items()}

Sitnik answered 18/10, 2019 at 13:42 Comment(0)

It seems like the api keeps on changing. For xgboost version 1.0.2, just changing from imp_vals = xgb.booster().get_fscore() to imp_vals = xgb.get_booster().get_fscore() in @David's answer does the trick. The updated code is -

from numpy import array

def get_xgb_imp(xgb, feat_names):
    imp_vals = xgb.get_booster().get_fscore()
    imp_dict = {feat_names[i]:float(imp_vals.get('f'+str(i),0.)) for i in range(len(feat_names))}
    total = array(imp_dict.values()).sum()
    return {k:v/total for k,v in imp_dict.items()}

Thespian answered 19/3, 2020 at 13:7 Comment(0)

I used the following code to get feature_importance. Also, I used DictVectorizer() in the pipeline for one_hot_encoding. If you use

v = DictVectorizer()
X_to_dict = X.to_dict("records")
X_transformed = v.fit_transform(X_to_dict)
feature_names = v.get_feature_names()
best_model.get_booster().feature_names = feature_names
xgb.plot_importance(best_model.get_booster())

You can obtain the f_score plot. But I wanted to plot the feature importance against the feature names. So I modified it further. f, ax = plt.subplots(figsize=(10, 30)) plt.barh(feature_names, best_model.feature_importances_) plt.xticks(rotation = 90) plt.show()

Liatris answered 2/9, 2022 at 14:5 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Importance Plot

Summary Plot

Dependence Plot

Recommended topics

Hot tags