I have some observational data for which I would like to estimate parameters, and I thought it would be a good opportunity to try out PYMC3.
My data is structured as a series of records. Each record contains a pair of observations that relate to a fixed one hour period. One observation is the total number of events that occur during the given hour. The other observation is the number of successes within that time period. So, for example, a data point might specify that in a given 1 hour period, there were 1000 events in total, and that of those 1000, 100 were successes. In another time period, there may be 1000000 events in total, of which 120000 are successes. The variance of the observations is not constant and depends on the total number of events, and it is partly this effect that I would like to control and model.
My first step in doing this to estimate the underlying success rate. I've prepared the code below which is intended to mimic the situation by providing two sets of 'observed' data by generating it using scipy. However, it does not work properly.
What I expect it to find is:
- loss_lambda_factor is roughly 0.1
- total_lambda (and total_lambda_mu) is roughly 120.
Instead, the model converges very quickly, but to an unexpected answer.
- total_lambda and total_lambda_mu are respectively sharp peaks around 5e5.
- loss_lambda_factor is roughly 0.
The traceplot (which I cannot post due to reputation below 10) is fairly uninteresting - quick convergence, and sharp peaks at numbers that do not correspond to the input data. I am curious as to whether there is something fundamentally wrong with the approach I am taking. How should the following code be modified to give the correct / expected result?
from pymc import Model, Uniform, Normal, Poisson, Metropolis, traceplot
from pymc import sample
import scipy.stats
totalRates = scipy.stats.norm(loc=120, scale=20).rvs(size=10000)
totalCounts = scipy.stats.poisson.rvs(mu=totalRates)
successRate = 0.1*totalRates
successCounts = scipy.stats.poisson.rvs(mu=successRate)
with Model() as success_model:
total_lambda_tau= Uniform('total_lambda_tau', lower=0, upper=100000)
total_lambda_mu = Uniform('total_lambda_mu', lower=0, upper=1000000)
total_lambda = Normal('total_lambda', mu=total_lambda_mu, tau=total_lambda_tau)
total = Poisson('total', mu=total_lambda, observed=totalCounts)
loss_lambda_factor = Uniform('loss_lambda_factor', lower=0, upper=1)
success_rate = Poisson('success_rate', mu=total_lambda*loss_lambda_factor, observed=successCounts)
with success_model:
step = Metropolis()
success_samples = sample(20000, step) #, start)
plt.figure(figsize=(10, 10))
_ = traceplot(success_samples)