Python statistics package: difference between statsmodel and scipy.stats [closed]
Asked Answered
C

3

29

I need some advice on selecting statistics package for Python, I've done quite some search, but not sure if I get everything right, specifically on the differences between statsmodels and scipy.stats.

One thing that I know is those with scikits namespace are specific "branches" of scipy, and what used to be scikits.statsmodels is now called statsmodels. On the other hand there is also scipy.stats. What are the differences between the two, and which one is the statistics package for Python?

Thanks.

--EDIT--

I changed the title because some answers are not really related to the question, and I suppose that's because the title is not clear enough.

Consummate answered 29/1, 2013 at 0:28 Comment(0)
F
41

Statsmodels has scipy.stats as a dependency. Scipy.stats has all of the probability distributions and some statistical tests. It's more like library code in the vein of numpy and scipy. Statsmodels on the other hand provides statistical models with a formula framework similar to R and it works with pandas DataFrames. There are also statistical tests, plotting, and plenty of helper functions in statsmodels. Really it depends on what you need, but you definitely don't have to choose one. They have different aims and strengths.

Firer answered 29/1, 2013 at 3:34 Comment(2)
"The" statistics package in python are both together. scipy.stats has a large number of distributions, most of the common parametric and nonparametric statistical tests, and descriptive statistics. statsmodels is much more focused on estimating statistical models. Speaking as one of the maintainers of scipy.stats and of statsmodels, we try to keep code duplication on a very low level.Wentz
This is exactly the answer I'm looking for. I'm very well aware of R and what you can do with it, including interfacing with Python. And I'm not looking for comparison of different statistical software. The question was specifically on the relation (i.e. difference) between statsmodel and scipy.stats in Python. I know stackoverflow is full of competent cool guys like you, thanks! (except those who closed the question, you guys are not cool........... I'm just joking, of course)Consummate
H
6

I try to use pandas/statsmodels/scipy for my work on a day-to-day basis, but sometimes those packages come up a bit short (LOESS, anybody?). The problem with the RPy module is (last I checked, at least) that it wants a specific version of R that isn't current---my R installation is 2.16 (I think) and RPy wanted 2.14. So either you have to have two parallel installations of R, or you have to downgrade. (If you don't have R installed, then you can just install the correct version of R and use RPy.)

So when I need something that isn't in pandas/statsmodels/scipy I write R scripts, and run them with the subprocess module. This lets me interact with R as little as possible (which I really don't like programming in), but I can still leverage all the stuff that R has that the Python packages don't.

The lesson is that there isn't ever one solution to any problem---you have to assemble a whole bunch of parts that are all useful to you (and maybe write some of your own), in a way that you understand, to solve problems. (R aficionados will disagree, of course!)

Hysteroid answered 29/1, 2013 at 4:29 Comment(4)
statsmodels.sourceforge.net/devel/generated/…Wentz
(I didn't manage to edit my comment.) I agree, there are still many methods missing in python.Wentz
It's just a reflection of the size and specializations of the respective communities. I think the Python community is growing much faster than the R community, though, for the simple fact that you can get a job as a Python coder much more easily than as an R coder. At least, that's what I would tell MY graduate students :)Hysteroid
In the four years since I wrote this, there seem to be better solutions: blog.rstudio.org/2016/03/29/featherHysteroid
C
-3

I think THE statistics package is numpy/scipy. It works also great if you want to plot your data using matplotlib. However, as far as I know, matplotlib doesn't work with Python 3.x yet.

Congeries answered 29/1, 2013 at 1:13 Comment(2)
numpy, scipy, matplotlib and statsmodels all work on python 3. matplotlib since the last release, but I was using an unreleased version of matplotlib on python 3 for almost a year.Wentz
nice! good to know that it is already officially available. thanks for the hint!Congeries

© 2022 - 2024 — McMap. All rights reserved.