I'm trying to preform recursive feature elimination using scikit-learn
and a random forest classifier, with OOB ROC as the method of scoring each subset created during the recursive process.
However, when I try to use the RFECV
method, I get an error saying AttributeError: 'RandomForestClassifier' object has no attribute 'coef_'
Random Forests don't have coefficients per se, but they do have rankings by Gini score. So, I'm wondering how to get arround this problem.
Please note that I want to use a method that will explicitly tell me what features from my pandas
DataFrame were selected in the optimal grouping as I am using recursive feature selection to try to minimize the amount of data I will input into the final classifier.
Here's some example code:
from sklearn import datasets
import pandas as pd
from pandas import Series
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import RFECV
iris = datasets.load_iris()
x=pd.DataFrame(iris.data, columns=['var1','var2','var3', 'var4'])
y=pd.Series(iris.target, name='target')
rf = RandomForestClassifier(n_estimators=500, min_samples_leaf=5, n_jobs=-1)
rfecv = RFECV(estimator=rf, step=1, cv=10, scoring='ROC', verbose=2)
selector=rfecv.fit(x, y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/bbalin/anaconda/lib/python2.7/site-packages/sklearn/feature_selection/rfe.py", line 336, in fit
ranking_ = rfe.fit(X_train, y_train).ranking_
File "/Users/bbalin/anaconda/lib/python2.7/site-packages/sklearn/feature_selection/rfe.py", line 148, in fit
if estimator.coef_.ndim > 1:
AttributeError: 'RandomForestClassifier' object has no attribute 'coef_'
feature_importances_
attribute after callingpredict
orpredict_proba
, this returns an array of percentages in the order that they were passed. See the online example – Redemptioner