Question
Why is there a significant difference between the r2_score
function in scikit-learn and the formula for the Coefficient of Determination as described in Wikipedia? Which is the correct one?
Context
I'm using with Python 3.5 to predict linear and quadratic models, and one of the measures of goodness of fit that I'm trying out is the . However, while testing, there's a marked difference between the r2_score
metric in scikit-learn
and the calculation provided in Wikipedia.
Code
I'm providing my code here as reference, which computes the example in the Wikipedia page linked above.
from sklearn.metrics import r2_score import numpy y = [1, 2, 3, 4, 5] f = [1.9, 3.7, 5.8, 8.0, 9.6] # Convert to numpy array and ensure double precision to avoid single precision errors observed = numpy.array(y, dtype=numpy.float64) predicted = numpy.array(f, dtype=numpy.float64) scipy_value = r2_score(observed, predicted) >>> scipy_value:
As is evident, the scipy
calculated value is -3.8699999999999992
while the reference value in Wikipedia is 0.998
.
Thank you!
UPDATE: This is different from this question about how R^2 is calculated in scikit-learn as what I'm trying to understand and have clarified is the discrepancy between both results. That question states that the formula used in scikit is the same as Wikipedia's which should not result in different values.
UPDATE #2: It turns out I made a mistake reading the Wikipedia article's example. Answers and comments below mention that the example I provide is for the linear, least squares fit of the (x, y) values in the example. For that, the answer in Wikipedia's article is correct. For that, the R^2 calue provided is 0.998. For the R^2 between both vectors, scikit's answer is also correct. Thanks a lot for your help!
scikit-learn
is valid or not is a very important point (and why) and in my opinion should be determined for future reference. – Cassaundracassava