How to find the features names of the coefficients using scikit linear regression?
Asked Answered
O

9

37

I use scikit linear regression and if I change the order of the features, the coef are still printed in the same order, hence I would like to know the mapping of the feature with the coeff.

#training the model
model_1_features = ['sqft_living', 'bathrooms', 'bedrooms', 'lat', 'long']
model_2_features = model_1_features + ['bed_bath_rooms']
model_3_features = model_2_features + ['bedrooms_squared', 'log_sqft_living', 'lat_plus_long']

model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])

model_2 = linear_model.LinearRegression()
model_2.fit(train_data[model_2_features], train_data['price'])

model_3 = linear_model.LinearRegression()
model_3.fit(train_data[model_3_features], train_data['price'])

# extracting the coef
print model_1.coef_
print model_2.coef_
print model_3.coef_
Octans answered 7/1, 2016 at 7:58 Comment(3)
How exactly would you change the order of the features? I usually use some zip(coef,featurenames) to print it correctly.Benenson
@RobinSpiess Example model_e_features = ['bedrooms_squared', 'log_sqft_living', 'lat_plus_long'] + model_2_featuresOctans
This is related to this more general question #40485785Blinker
B
27

The trick is that right after you have trained your model, you know the order of the coefficients:

model_1 = linear_model.LinearRegression()
model_1.fit(train_data[model_1_features], train_data['price'])
print(list(zip(model_1.coef_, model_1_features)))

This will print the coefficients and the correct feature. (Tested with pandas DataFrame)

If you want to reuse the coefficients later you can also put them in a dictionary:

coef_dict = {}
for coef, feat in zip(model_1.coef_,model_1_features):
    coef_dict[feat] = coef

(You can test it for yourself by training two models with the same features but, as you said, shuffled order of features.)

Benenson answered 18/1, 2016 at 9:47 Comment(5)
I think it should be print(list(zip(model_1.coef_, model_1_features))), i.e. coef_ instead of coef_[0]. Otherwise zip does not have anything to iterate over.Poniard
@AlexFedulov Ah yes, thanks. I think the DataFrame I used in the example to test the code might have caused sklearn to think that I provided multiple targets. And because coef_ will return a 2d array if multiple targets are given, I had to use coef_[0]. But generally coef_ should give the right result.Benenson
When I do print(list(zip(model_1.coef_, model_1_features))) in Jupyter the result I get isn't easy to read. (Co-efficient array displayed first, with features list stacked below it). When I reshape both to make it the other way round, my printout includes some fluff which also makes it hard to read. e.g.: dtype='<U11')), (array([-0.47048405]), array([' feature1], Shoot
does not work for me. NameError: name 'classifier_features' is not definedDrown
@robin Spiess This isn't really a good solution (although that's hardly your fault). If I ran 200 models over the course of a project, saving the names of the inputs in a separate dictionary would require me to maintain 400 'things': one object and one input list for each model. In contrast, if the relevant inputs were bundled in the predictor, I would only have to maintain 200 things. In other systems, such as SAS, you only need to provide a file with the same names and types as the original training set. With sklearn, the position has to be correct as well.Nombles
A
13
import pandas as pd

import numpy as np

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)

coef_table = pd.DataFrame(list(X_train.columns)).copy()
coef_table.insert(len(coef_table.columns),"Coefs",regressor.coef_.transpose())
Alcestis answered 3/6, 2020 at 8:13 Comment(1)
You can create a data frame including the feature names in one column and the coefficients of these features in another columnAlcestis
G
10

@Robin posted a great answer, but for me I had to make one tweak on it to work the way I wanted, and it was to refer to the dimension of the 'coef_' np.array that I wanted, namely modifying to this: model_1.coef_[0,:], as below:

coef_dict = {}
for coef, feat in zip(model_1.coef_[0,:],model_1_features):
    coef_dict[feat] = coef

Then the dict was created as I pictured it, with {'feature_name' : coefficient_value} pairs.

Grosmark answered 30/8, 2018 at 18:45 Comment(0)
B
4

As of scikit-learn version 1.0, the LinearRegression estimator has a feature_names_in_ attribute. From the docs:

feature_names_in_ : ndarray of shape (n_features_in_,)

Names of features seen during fit. Defined only when X has feature names that are all strings.

New in version 1.0.

Assuming you're fitting on a pandas.DataFrame (train_data), your estimators (model_1, model_2, and model_3) will have the attribute. You can line up your coefficients using any of the methods listed in previous answers, but I'm in favor of this one:

coef_series = pd.Series(
    data=model_1.coef_,
    index=model_1.feature_names_in_
)

A minimally reproducible example

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression


# for repeatability
np.random.seed(0)

# random data
Xy = pd.DataFrame(
  data=np.random.random((10, 3)),
  columns=["x0", "x1", "y"]
)

# separate X and y
X = Xy.drop(columns="y")
y = Xy.y

#  initialize estimator
lr = LinearRegression()

# fit to pandas.DataFrame
lr.fit(X, y)

# get coeficients and their respective feature names
coef_series = pd.Series(
  data=lr.coef_,
  index=lr.feature_names_in_
)

print(coef_series)
x0    0.230524
x1   -0.275611
dtype: float64
Becalm answered 6/2, 2023 at 17:5 Comment(0)
S
0

Here is what I use for pretty printing of coefficients in Jupyter. I'm not sure I follow why order is an issue - as far as I know the order of the coefficients should match the order of the input data that you gave it.

Note that the first line assumes you have a Pandas data frame called df in which you originally stored the data prior to turning it into a numpy array for regression:

fieldList = np.array(list(df)).reshape(-1,1)

coeffs = np.reshape(np.round(clf.coef_,5),(-1,1))
coeffs=np.concatenate((fieldList,coeffs),axis=1)
print(pd.DataFrame(coeffs,columns=['Field','Coeff']))
Shoot answered 29/5, 2018 at 7:53 Comment(0)
P
0

Borrowing from Robin, but simplifying the syntax:

coef_dict = dict(zip(model_1_features, model_1.coef_))

Important note about zip: zip assumes its inputs are of equal length, making it especially important to confirm that the lengths of the features and coefficients match (which in more complicated models might not be the case). If one input is longer than the other, the longer input will have the values in its extra index positions cut off. Notice the missing 7 in the following example:

In [1]: [i for i in zip([1, 2, 3], [4, 5, 6, 7])]
Out[1]: [(1, 4), (2, 5), (3, 6)]
Possie answered 5/10, 2018 at 15:38 Comment(0)
T
0
pd.DataFrame(data=regression.coef_, index=X_train.columns)
Tumbledown answered 23/7, 2022 at 23:33 Comment(1)
This answer was reviewed in the Low Quality Queue. Here are some guidelines for How do I write a good answer?. Code only answers are not considered good answers, and are likely to be downvoted and/or deleted because they are less useful to a community of learners. It's only obvious to you. Explain what it does, and how it's different / better than existing answers.Procne
C
0

All of these answers were great but what personally worked for me was this, as the feature names I needed were the columns of my train_date dataframe:

pd.DataFrame(data=model_1.coef_,columns=train_data.columns)
Cordillera answered 9/9, 2022 at 20:1 Comment(0)
I
0

Right after training the model, the coefficient values are stored in the variable model.coef_[0]. We can iterate over the column names and store the column name and their coefficient value in a dictionary.

model.fit(X_train,y)
# assuming all the columns except last one is used in training
columns = data.iloc[:,-1].columns
coef_dict = {}
for i in range(0,len(columns)):
  coef_dict[columns[i]] = model.coef_[0][i]

Hope this helps!

Imputation answered 31/12, 2022 at 9:12 Comment(2)
Careful, this overwrites built-in dict.Hoosegow
I may have overlooked the fact that I accidentally named the variable dict. We can definitely use any name for the dictionary variable. I will change it by editing it.Imputation

© 2022 - 2024 — McMap. All rights reserved.