I am trying to fit a Poisson distribution to my data using statsmodels but I am confused by the results that I am getting and how to use the library.
My real data will be a series of numbers that I think that I should be able to describe as having a poisson distribution plus some outliers so eventually I would like to do a robust fit to the data.
However for testing purposes, I just create a dataset using scipy.stats.poisson
samp = scipy.stats.poisson.rvs(4,size=200)
So to fit this using statsmodels I think that I just need to have a constant 'endog'
res = sm.Poisson(samp,np.ones_like(samp)).fit()
print res.summary()
Poisson Regression Results
==============================================================================
Dep. Variable: y No. Observations: 200
Model: Poisson Df Residuals: 199
Method: MLE Df Model: 0
Date: Fri, 27 Jun 2014 Pseudo R-squ.: 0.000
Time: 14:28:29 Log-Likelihood: -404.37
converged: True LL-Null: -404.37
LLR p-value: nan
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
const 1.3938 0.035 39.569 0.000 1.325 1.463
==============================================================================
Ok, that doesn't look right, But if I do
res.predict()
I get an array of 4.03 (which was the mean for this test sample). So basically, firstly I very confused how to interpret this result from statsmodel and secondly I should probably being doing something completely different if I'm interested in robust parameter estimation of a distribution rather than fitting trends but how should I go about doing that?
Edit I should really have given more detail in order to answer the second part of my question.
I have an event that occurs a random time after a starting time. When I plot a histogram of the delay times for many events, I see that the distribution looks like a scaled Poisson distribution plus several outlier points which are normally caused by issues in my underlying system. So I simply wanted to find the expected time delay for the dataset, excluding the outliers. If not for the outliers, I could simply find the mean time. I suppose that I could exclude them manually but I thought that I could find something more exacting.
Edit On further reflection, I will be considering other distributions instead of sticking with a Poissonion and the details of my issue are probably a distraction from the original question but I've left them here anyway.