scipy.stats.linregress - get p-value of intercept
Asked Answered
Y

2

8

scipy.stats.linregress returns a p-value corresponding to the slope, but no p-value for the intercept. Consider the following example from the docs:

>>> from scipy import stats
>>> import numpy as np
>>> x = np.random.random(10)
>>> y = np.random.random(10)
>>> slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
>>> p_value
0.40795314163864016

According to the docs, p-value is the "two-sided p-value for a hypothesis test whose null hypothesis is that the slope is zero." I would like to get the same statistics, but for the intercept instead of the slope.

statsmodels.regression.linear_model.OLS returns p-values for both coefficients out of the box:

>>> import numpy as np

>>> import statsmodels.api as sm

>>> X = sm.add_constant(x)
>>> model = sm.OLS(y,X)
>>> results = model.fit()
>>> results.pvalues
array([ 0.00297559,  0.40795314])    

Using only scipy, how can I get the p-value (0.40795314163864016) for the intercept?

Yuk answered 26/2, 2015 at 21:54 Comment(0)
I
4

To compute the pvalue for the intercept you:

  • start from the tvalue which is computed starting from mean and stderr of the intercept (see function tvalue below)
  • then compute the pvalue using survival function for t distribution and the degrees of freedom (see function pvalue below)

Python code for the scipy case:

import scipy.stats
from scipy import stats
import numpy as np

def tvalue(mean, stderr):
    return mean / stderr

def pvalue(tvalue, dof):
    return 2*scipy.stats.t.sf(abs(tvalue), dof)

np.random.seed(42)
x = np.random.random(10)
y = np.random.random(10)
scipy_results = stats.linregress(x,y)
print(scipy_results)
dof = 1.0*len(x) - 2
print("degrees of freedom = ", dof)
tvalue_intercept = tvalue(scipy_results.intercept, scipy_results.intercept_stderr)
tvalue_slope = tvalue(scipy_results.slope, scipy_results.stderr)
pvalue_intercept = pvalue(tvalue_intercept, dof)
pvalue_slope = pvalue(tvalue_slope, dof)
print(f"""tvalues(intercept, slope) = {tvalue_intercept, tvalue_slope}
pvalues(intercept, slope) = {pvalue_intercept, pvalue_slope}
""")

output:

LinregressResult(slope=0.6741948478345656, intercept=0.044594333294114996, rvalue=0.7042846127289285, pvalue=0.02298486740535295, stderr=0.24027039310814322, intercept_stderr=0.14422953722007206)
degrees of freedom =  8.0
tvalues(intercept, slope) = (0.30919001858870915, 2.8059838713924172)
pvalues(intercept, slope) = (0.7650763497698203, 0.02298486740535295)

compare with the result you obtain with statsmodels:

import statsmodels.api as sm
import math

X = sm.add_constant(x)
model = sm.OLS(y,X)
statsmodels_results = model.fit()
print(f"""intercept, slope = {statsmodels_results.params}
rvalue = {math.sqrt(statsmodels_results.rsquared)}
tvalues(intercept, slope) = {statsmodels_results.tvalues}
pvalues(intercept, slope) = {statsmodels_results.pvalues}""")

output:

intercept, slope = [0.04459433 0.67419485]
rvalue = 0.7042846127289285
tvalues(intercept, slope) = [0.30919002 2.80598387]
pvalues(intercept, slope) = [0.76507635 0.02298487]

notes

  • fixing a random seed to have reproducible results
  • using LinregressResult object which contains also intercept_stderr

references

Intelligent answered 19/6, 2022 at 10:36 Comment(0)
S
-2

From SciPy.org documents: https://docs.scipy.org/doc/scipy-.14.0/reference/generated/scipy.stats.linregress.html

print "r-squared:", r_value**2

output

r-squared: 0.15286643777

For other parameters, try:

print ('Intercept is: ', (intercept))
print ('Slope is: ', (slope))
print ('R-Value is: ', (r_value))
print ('Std Error is: ', (std_err))
print ('p-value is: ', (p_value))
Soileau answered 11/11, 2017 at 21:22 Comment(1)
The question if for the p-value of the intercept. This is not an answer.Shelbashelbi

© 2022 - 2024 — McMap. All rights reserved.