I'm new to using pymc3, I've read Bayesian Methods for Hackers and done my best to work through existing survival analysis tutorials in pymc3. However, I don't understand how to write/interpret the "survival function".
For this problem, I've generated some dummy data from a Weibull Distribution defined by NIST here:
n = 1000
alpha = 1
gam = 0.5
mu = 0
noise = np.random.normal(0, 0.025, [n, 1])
x = np.random.rand(n, 1)*10
f_x = (gam/alpha)*(((gam-mu)/alpha)**(gam-1))*np.exp(-((x-mu)/alpha)**gam)
y = f_x + noise
Since I want to create a model with censored data and uncensored data like the pymc3 tutorial for Bayesian Parametrization, I implemented a cutoff and censored those data points:
cens = np.array([1 if k < 7.5 else 0 for k in x])
Then I began to establish my model with priors:
with pm.Model() as survival_model:
alpha0 = pm.Normal('alpha0', mu=1, sigma = 1)
gam0 = pm.Normal('gam0', mu=0.5, sigma = 1)
mu0 = pm.Normal('mu0', mu=0.0, sigma = 1)
noise0 = pm.Normal('noise0', mu=0.0, sigma = 0.05)
Now the trouble begins, I know I need to define a likelihood function that accounts for the censored values and take all the parameters as inputs to output the likelihood. I think for the censored values I have to find the equation to describe P(Y > y). Typically, I could use the CDF but in this case I've found using Matlab and Mathematica the indefinite integral does not exist. What should I do?
lifelines
package github.com/CamDavidsonPilon/lifelines, maybe it has what you want – Denominate