sklearn GridSearchCV with Pipeline
Asked Answered
E

5

33

I am trying to build a pipeline which first does RandomizedPCA on my training data and then fits a ridge regression model. Here is my code:

pca = RandomizedPCA(1000, whiten=True)
rgn = Ridge()

pca_ridge = Pipeline([('pca', pca),
                      ('ridge', rgn)])

parameters = {'ridge__alpha': 10 ** np.linspace(-5, -2, 3)}

grid_search = GridSearchCV(pca_ridge, parameters, cv=2, n_jobs=1, scoring='mean_squared_error')
grid_search.fit(train_x, train_y[:, 1:])

I know about the RidgeCV function but I want to try out Pipeline and GridSearch CV.

I want the grid search CV to report RMSE error, but this doesn't seem supported in sklearn so I'm making do with MSE. However, the scores it resports are negative:

In [41]: grid_search.grid_scores_
Out[41]: 
[mean: -0.02665, std: 0.00007, params: {'ridge__alpha': 1.0000000000000001e-05},
 mean: -0.02658, std: 0.00009, params: {'ridge__alpha': 0.031622776601683791},
 mean: -0.02626, std: 0.00008, params: {'ridge__alpha': 100.0}]

Obviously this isn't possible for mean squared error - what am I doing wrong here?

Entourage answered 10/1, 2014 at 16:59 Comment(0)
A
48

Those scores are negative MSE scores, i.e. negate them and you get the MSE. The thing is that GridSearchCV, by convention, always tries to maximize its score so loss functions like MSE have to be negated.

Andryc answered 11/1, 2014 at 11:7 Comment(3)
Can you point out any documents about this or it based on your test?Marjorymarjy
github.com/scikit-learn/scikit-learn/issues/2439 (I personally think it should be negative and not "negated")Domiciliary
I'm a bit confused now. Do I have to use 'neg_mean_squared_error' in model.compile() for "loss" and metric" or 'mean_squared_error'?Arthrospore
K
7

An alternate way to create GridSearchCV is to use make_scorer and turn greater_is_better flag to False

So, if rgn is your regression model, and parameters are your hyperparameter lists, you can use the make_scorer like this:

from sklearn.metrics import make_scorer
#define your own mse and set greater_is_better=False
mse = make_scorer(mean_squared_error,greater_is_better=False)

Now, same as below, you can call the GridSearch and pass your defined mse

grid_obj = GridSearchCV(rgn, parameters, cv=5,scoring=mse,n_jobs = -1, verbose=True)
Klug answered 12/4, 2020 at 6:22 Comment(0)
T
0

If you want to get RMSE as a metric you can write your own callable/function which will take Y_pred and Y_org and calculate the RMSE.

ref

Tavish answered 11/7, 2018 at 5:11 Comment(0)
P
0

Suppose, I have stored results of negative MSE and negative MAE obtained from GridSearchCV in lists named as model_nmse and model_nmae respectively .

So i would simply multiply it with (-1) , to get desired MSE and MAE scores.

model_mse = list(np.multiply(model_nmse , -1))

model_mae = list(np.multiply(model_nmae , -1))
Pogge answered 27/2, 2020 at 19:48 Comment(0)
Q
0

You can see the scoring in the documentation

enter image description here

Quagga answered 12/6, 2020 at 8:49 Comment(3)
The question asks for why the RMSE values turn out negative; this doesn't seem like the answer to the question.Manchu
@Gust there is a 'neg_root_mean_squared_error', I thought it will be easy to get the RMSE right?Quagga
@JeremyCaney Thanks for your advice, here is the link to scikit learn document of scoring scikit-learn.org/stable/modules/…Quagga

© 2022 - 2024 — McMap. All rights reserved.