Significant mismatch between `r2_score` of `scikit-learn` and the R^2 calculation
Asked Answered
C

5

16

Question

Why is there a significant difference between the r2_score function in scikit-learn and the formula for the Coefficient of Determination as described in Wikipedia? Which is the correct one?


Context

I'm using with Python 3.5 to predict linear and quadratic models, and one of the measures of goodness of fit that I'm trying out is the . However, while testing, there's a marked difference between the r2_score metric in scikit-learn and the calculation provided in Wikipedia.


Code

I'm providing my code here as reference, which computes the example in the Wikipedia page linked above.

from sklearn.metrics import r2_score
import numpy

y = [1, 2, 3, 4, 5]
f = [1.9, 3.7, 5.8, 8.0, 9.6]

# Convert to numpy array and ensure double precision to avoid single precision errors
observed = numpy.array(y, dtype=numpy.float64)
predicted = numpy.array(f, dtype=numpy.float64)

scipy_value = r2_score(observed, predicted)

>>> scipy_value: 

As is evident, the scipy calculated value is -3.8699999999999992while the reference value in Wikipedia is 0.998.

Thank you!

UPDATE: This is different from this question about how R^2 is calculated in scikit-learn as what I'm trying to understand and have clarified is the discrepancy between both results. That question states that the formula used in scikit is the same as Wikipedia's which should not result in different values.

UPDATE #2: It turns out I made a mistake reading the Wikipedia article's example. Answers and comments below mention that the example I provide is for the linear, least squares fit of the (x, y) values in the example. For that, the answer in Wikipedia's article is correct. For that, the R^2 calue provided is 0.998. For the R^2 between both vectors, scikit's answer is also correct. Thanks a lot for your help!

Cassaundracassava answered 30/10, 2015 at 2:44 Comment(2)
Possible duplicate of How is the R2 value in Scikit learn calculated?Angulation
I think the question you referred does not correctly answer my question. There is no mention of the discrepancy between the results from both sources, and that is the main point my question tries to address. In fact, whether the calculation in scikit-learn is valid or not is a very important point (and why) and in my opinion should be determined for future reference.Cassaundracassava
N
9

The referred question is correct -- if you work through the calculation for the residual sum of squares and the total sum of squares, you get the same value as sklearn:

In [85]: import numpy as np

In [86]: y = [1,2,3,4,5]

In [87]: f = [1.9, 3.7, 5.8, 8.0, 9.6]

In [88]: SSres = sum(map(lambda x: (x[0]-x[1])**2, zip(y, f)))

In [89]: SStot = sum([(x-np.mean(y))**2 for x in y])

In [90]: SSres, SStot
Out[90]: (48.699999999999996, 10.0)

In [91]: 1-(SSres/SStot)
Out[91]: -3.8699999999999992

The idea behind a negative value is that you'd have been closer to the actual values had you just predicted the mean each time (which would correspond to an r2 = 0).

Nonfulfillment answered 30/10, 2015 at 3:23 Comment(2)
So, basically, the result given in the Wikipedia answer is incorrect?Cassaundracassava
@JuanCarlosCoto no, wikipedia is correct. The wikipedia article states the R^2 of a linear least-squares fit to the given x-y data is 0.998. The data given there is not y and f. See my answer for more.Globulin
G
13

I think you have misinterpreted wikipedia. The example on wikipedia does not state:

y = [1, 2, 3, 4, 5]
f = [1.9, 3.7, 5.8, 8.0, 9.6]
R^2 = 0.998

Instead, it says that the R^2 for a linear least-squares fit to the data:

x = [1, 2, 3, 4, 5]
y = [1.9, 3.7, 5.8, 8.0, 9.6]

is equal to 0.998

Consider this script, which first uses np.linalg.lstsq to find the least squares fit, and the uses both methods to find an R^2 of 0.998 for both:

import numpy as np
from sklearn.metrics import r2_score

x = np.arange(1, 6, 1)
y = np.array([1.9, 3.7, 5.8, 8.0, 9.6])

A = np.vstack([x, np.ones(len(x))]).T

# Use numpy's least squares function
m, c = np.linalg.lstsq(A, y)[0]

print(m, c)
# 1.97 -0.11

# Define the values of our least squares fit
f = m * x + c

print(f)
# [ 1.86  3.83  5.8   7.77  9.74]

# Calculate R^2 explicitly
yminusf2 = (y - f)**2
sserr = sum(yminusf2)
mean = float(sum(y)) / float(len(y))
yminusmean2 = (y - mean)**2
sstot = sum(yminusmean2)
R2 = 1. -(sserr / sstot)

print(R2)
# 0.99766066838

# Use scikit
print(r2_score(y,f))
# 0.99766066838

r2_score(y,f) == R2
# True
Globulin answered 30/10, 2015 at 11:20 Comment(0)
N
9

The referred question is correct -- if you work through the calculation for the residual sum of squares and the total sum of squares, you get the same value as sklearn:

In [85]: import numpy as np

In [86]: y = [1,2,3,4,5]

In [87]: f = [1.9, 3.7, 5.8, 8.0, 9.6]

In [88]: SSres = sum(map(lambda x: (x[0]-x[1])**2, zip(y, f)))

In [89]: SStot = sum([(x-np.mean(y))**2 for x in y])

In [90]: SSres, SStot
Out[90]: (48.699999999999996, 10.0)

In [91]: 1-(SSres/SStot)
Out[91]: -3.8699999999999992

The idea behind a negative value is that you'd have been closer to the actual values had you just predicted the mean each time (which would correspond to an r2 = 0).

Nonfulfillment answered 30/10, 2015 at 3:23 Comment(2)
So, basically, the result given in the Wikipedia answer is incorrect?Cassaundracassava
@JuanCarlosCoto no, wikipedia is correct. The wikipedia article states the R^2 of a linear least-squares fit to the given x-y data is 0.998. The data given there is not y and f. See my answer for more.Globulin
W
5

Both method uses the same formula to calculate the R-Square. check out the code below:

    # Data
    X=np.array([1.9, 3.7, 5.8, 8.0, 9.6]).reshape(-1, 1)
    y=[1,2,3,4,5]

    # Import module
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import r2_score

    reg = LinearRegression().fit(X, y)

    # Predict the target variable
    y_pred=reg.predict(X)

    # R-Square fitness
    print('R-Square(metrics):', r2_score(y, y_pred))


    # R-Square using score method
    print('R-Sqaure(Score):',reg.score(X, y))

Output: R-Square(metrics): 0.9976606683804627 R-Sqaure(Score): 0.9976606683804627

Wellbeing answered 29/11, 2019 at 11:13 Comment(0)
C
4

The coefficient of determination effectively compares the variance in the data to the variance in the residual. The residual is the difference between the predicted and observed value and its variance is the sum of squares of this difference.

If the prediction is perfect, the variance of the residual is zero. Hence, the coefficient of determination is one. If the prediction is not perfect some of the residuals are non-zero and the variance of the residuals is positive. Hence, the coefficient of determination is lower than one.

The toy problem obviously has a low coefficient of determination since most of the predicted values are way off. A coefficient of determination of -3.86 means that the variance of the residual is 4.86 times as large as the variance in the observed values.

The 0.998 value comes from the coefficient of determination of linear least squares fit of the set of data. This means that the observed values are related to the predicted values by a linear relation (plus a constant) that minimizes the variance of the residual. The observed and predicted values from the toy problem are highly linear dependent and thus the coefficient of determination of the linear least squares fit is very close to one.

Christiniachristis answered 30/10, 2015 at 12:6 Comment(0)
P
0

Both are correct. The problem is that scikit learn uses the equation for R2 direct on the data.

y = [1, 2, 3, 4, 5]

f = [1.9, 3.7, 5.8, 8.0, 9.6]

Scikit learn calculate SSR and SST considering y is True values and f is a prediction of the y.

Wikipedia uses y as a feature array (x) and f is who you need to predict (y). So there is a regression that became in f_pred = 1.97y + 0.11. So, now you have true values of f and f_pred of f. R2 is calculated between them.

y = [1, 2, 3, 4, 5]

f = [1.9, 3.7, 5.8, 8.0, 9.6]

f_pred = [1.86, 3.83, 5.8, 7.77, 9.74]

if you use the equation (1- SSR/SST) using f and f_pred data:

SSR = SUM[(f-fp_pred)^2] = SUM[0.0016, 0.0169, 0.0529, 0.0196, 0.091] = 0.091

SST = SUM[(f-AVE(f))^2] = SUM[15.21, 4.41, 4.84, 14.44, 38.9] = 38.9

R2 = (1-0.091/38.9) = 0.998

Negative R2 in scikit learn means that your model is worse than the average of the observed train data. Negative R2 happens especially in test data because they do not participate in the fit modeling. When you have a negative R2 value in scikit learn, you are going to have an R2 close to zero using R2 of the linear regression between True and Pred values.

Pistil answered 29/12, 2020 at 15:8 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.