Why does t-test in Python (scipy, statsmodels) give results different from R, Stata, or Excel?

Asked 20/12, 2013 at 18:52 Answered 20/12, 2013 at 23:18

(problem resolved; x,y and s1,s2 were of different size)

in R:

x <- c(373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260)
y <- c(411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303)
t.test(x,y)
t = -1.6229, df = 29.727, p-value = 0.1152

Same numbers are obtained in STATA and Excel

t.test(x,y,alternative="less")
t = -1.6229, df = 29.727, p-value = 0.05758

I cannot replicate the same result using either statsmodels.stats.weightstats.ttest_ind or scipy.stats.ttest_ind no matter which options I try.

statsmodels.stats.weightstats.ttest_ind(s1,s2,alternative="two-sided",usevar="unequal")
(-1.8912081781378358, 0.066740317997990656, 35.666557473974343)

scipy.stats.ttest_ind(s1,s2,equal_var=False)
(array(-1.8912081781378338), 0.066740317997990892)

scipy.stats.ttest_ind(s1,s2,equal_var=True)
(array(-1.8912081781378338), 0.066664507499812745)

There must be thousands of people who use Python to calculate t-test. Are we all getting incorrect results? (I typically rely on Python but this time I checked my results with STATA).

Veteran answered 20/12, 2013 at 18:52 Comment(6)

I just ran stats.ttest_ind(x,y,equal_var=True) and got array(-1.6229, 0.1152). In your example check that s1/s2 == x/y. – Klaipeda 20/12, 2013 at 19:43

What version of scipy are you using? – Newsprint 20/12, 2013 at 19:43

Looking further into the problem, I see that while R gives df = 29.727, Python gives df=35.666. So I suspect that the error must be caused by df calculations... Warren, I am still getting stats.ttest_ind(s1,s2,equal_var=True) (array(-1.8912081781378338), 0.066664507499812745). I am using the most recent Enthought Canopy Python installation. – Veteran 20/12, 2013 at 19:45

Like @tnknepp, I get (array(-1.62292672368488), 0.11506840827144681) from ttest_ind(x, y, equal_var=True). – Newsprint 20/12, 2013 at 19:50

Show the complete python code. Perhaps s1 and s2 are not the same as x and y in the R example. – Newsprint 20/12, 2013 at 19:51

You are right, s1 and s2 were identical to x and y but had one more value at the end of the vector. I am sorry and I appreciate everyone's help figuring out the problem. – Veteran 20/12, 2013 at 20:13

That's the result that I get, with default equal var:

>>> x_ = (373,398,245,272,238,241,134,410,158,125,198,252,577,272,208,260)
>>> y_ = (411,471,320,364,311,390,163,424,228,144,246,371,680,384,279,303)

>>> from scipy import stats
>>> stats.ttest_ind(x_, y_)
(array(-1.62292672368488), 0.11506840827144681)

>>> import statsmodels.api as sm
>>> sm.stats.ttest_ind(x_, y_)
(-1.6229267236848799, 0.11506840827144681, 30.0)

and with unequal var:

>>> statsmodels.stats.weightstats.ttest_ind(x_, y_,alternative="two-sided",usevar="unequal")
(-1.6229267236848799, 0.11516398707890187, 29.727196553288369)
>>> stats.ttest_ind(x_, y_, equal_var=False)
(array(-1.62292672368488), 0.11516398707890187)

Disembodied answered 20/12, 2013 at 19:49 Comment(0)

The short answer is that the t-tests as provided in Python are the same results as one would get in R and Stata, you just had an additional element in your Python arrays.

I wouldn't bank on Excel's robustness, however.

Pursuivant answered 20/12, 2013 at 23:18 Comment(0)

Recommended topics

Hot tags