Regression using PYMC3

About

Asked 14/2, 2014 at 20:16 Answered 15/2, 2014 at 22:3

Solved python pandas bayesian statsmodels pymc3

I posted a IPython notebook here http://nbviewer.ipython.org/gist/dartdog/9008026

And I worked through both standard Statsmodels OLS and then similar with PYMC3 with the data provided via Pandas, that part works great by the way.

I can't see how to get the more standard parameters out of PYMC3? The examples seem to just use OLS to plot the base regression line. It seems that the PYMC3 model data should be able to give the parameters for the regression line? in addition to the probable traces,, ie what is the highest probability line?

Any further explanation of interpretation of Alpha, beta and sigma welcomed!

Also how to use PYMC3 model to estimate a future value of y given a new x ie prediction with some probability?

And lastly PYMC3 has a newish GLM wrapper which I tried and it seemed to get messed up? (it could well be me though)

Teaser answered 14/2, 2014 at 20:16 Comment(1)

Well I was hopeful, but I get that the question is a bit unfocused for SO.. still not sure how/where to get help? – Teaser 15/2, 2014 at 16:20

The glm submodule sets some default priors which might very well not be appropriate for every case of which yours is one. You can change them by using the family argument, e.g.:

pm.glm.glm('y ~ x', data,
           family=pm.glm.families.Normal(priors={'sd': ('sigma', pm.Uniform.dist(0, 12000))}))

Unfortunately this isn't very well documented yet and requires some good examples.

Anglo answered 15/2, 2014 at 22:3 Comment(7)

Thanks so much! Great work,, I hope that working through some of this may help the doc process! any suggestions on deriving the "most probable" regression using the probabilistic model rather than just overlaying a frequentist regression? Ie how to pull some useful parameters from the pymc3 model? – Teaser 15/2, 2014 at 22:7

I think the Posterior Predictive in cell #11 is the Bayesian way to do this. To get a measure you could compute the residuals for every sampled regression line and average them (that way you can do model comparison). – Anglo 17/2, 2014 at 12:43

Can't figure out how to apply the solution to my code,, when I try to just run the code as provided I get module not callable and when I try to place the code in the "with" clause it blows up..(so I have not translated that correctly.) I tried to just use the Family=clause inside the "with" gives KeyError: 0 – Teaser 17/2, 2014 at 15:10

vars = pm.glm.glm('y ~ x', data, family=pm.glm.glm.families.Normal(priors={'sd': {'sigma': pm.Uniform.dist(0, 12000)}})) Gives me AttributeError: 'function' object has no attribute 'families' – Teaser 17/2, 2014 at 15:21

Sorry @dartdog, I'll have a look later! – Anglo 18/2, 2014 at 15:59

Until then, you can probably just standardize your data which should bring the sd to a more reasonable range. – Anglo 19/2, 2014 at 21:21

Sorry this took a while but I updated the above code which should work now (tried it on your example and get the same results as manually). You'll see that instead of passing a dict, the prior is defined a tuple pair (name, distribution). – Anglo 30/3, 2014 at 9:35

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags