I'm new to Python and have been an R User. I am getting VERY different results from a simple regression model when I build it in R vs. when I execute the same thing in iPython.
The R-Squared, The P Value , The significance of the co-efficients - nothing matches . Am I reading the output wrong or making some other fundamental error?
Below are my codes for both and results:
R Code
str(df_nv)
Classes 'tbl_df', 'tbl' and 'data.frame': 81 obs. of 2 variables:
$ Dependent Variabls : num 733 627 405 353 434 556 381 558 612 901 ...
$ Independent Variable: num 0.193 0.167 0.169 0.14 0.145 ...
summary(lm(`Dependent Variable` ~ `Independent Variable`, data = df_nv))
Call:
lm(formula = `Dependent Variable` ~ `Independent Variable`, data = df_nv)
Residuals:
Min 1Q Median 3Q Max
-501.18 -139.20 -82.61 -15.82 2136.74
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 478.2 148.2 3.226 0.00183 **
`Independent Variable` -196.1 1076.9 -0.182 0.85601
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 381.5 on 79 degrees of freedom
Multiple R-squared: 0.0004194, Adjusted R-squared: -0.01223
F-statistic: 0.03314 on 1 and 79 DF, p-value: 0.856
iPython Notebook Code
df_nv.dtypes
Dependent Variable float64
Independent Variable float64
dtype: object
model = sm.OLS(df_nv['Dependent Variable'], df_nv['Independent Variable'])
results = model.fit()
results.summary()
OLS Regression Results
Dep. Variable: Dependent Variable R-squared: 0.537
Model: OLS Adj. R-squared: 0.531
Method: Least Squares F-statistic: 92.63
Date: Fri, 20 Jan 2017 Prob (F-statistic): 5.23e-15
Time: 09:08:54 Log-Likelihood: -600.40
No. Observations: 81 AIC: 1203.
Df Residuals: 80 BIC: 1205.
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Independent Variable 3133.1825 325.537 9.625 0.000 2485.342 3781.023
Omnibus: 89.595 Durbin-Watson: 1.940
Prob(Omnibus): 0.000 Jarque-Bera (JB): 980.289
Skew: 3.489 Prob(JB): 1.36e-213
Kurtosis: 18.549 Cond. No. 1.00
For reference, head of dataframe in both R and Python :
R:
head(df_nv)
Dependent Variable Independent Variable
<dbl> <dbl>
1 733 0.1932367
2 627 0.1666667
3 405 0.1686183
4 353 0.1398601
5 434 0.1449275
6 556 0.1475410
Python:
df_nv.head()
Dependent Variable Independent Variable
5292 733.0 0.193237
5320 627.0 0.166667
5348 405.0 0.168618
5404 353.0 0.139860
5460 434.0 0.144928
sm.add_constant(df_nv['Independent Variable'])
– Primateship