How to do simple survival analysis with pymc3 (Weibull distribution regression)?

I'm new to using pymc3, I've read Bayesian Methods for Hackers and done my best to work through existing survival analysis tutorials in pymc3. However, I don't understand how to write/interpret the "survival function".

For this problem, I've generated some dummy data from a Weibull Distribution defined by NIST here:

n = 1000
alpha = 1 
gam = 0.5
mu = 0
noise = np.random.normal(0, 0.025, [n, 1])

x = np.random.rand(n, 1)*10
f_x = (gam/alpha)*(((gam-mu)/alpha)**(gam-1))*np.exp(-((x-mu)/alpha)**gam)
y = f_x + noise

Since I want to create a model with censored data and uncensored data like the pymc3 tutorial for Bayesian Parametrization, I implemented a cutoff and censored those data points:

cens = np.array([1 if k < 7.5 else 0 for k in x])

Then I began to establish my model with priors:

with pm.Model() as survival_model:

     alpha0 = pm.Normal('alpha0', mu=1, sigma = 1)
     gam0 = pm.Normal('gam0', mu=0.5, sigma = 1)
     mu0 = pm.Normal('mu0', mu=0.0, sigma = 1)
     noise0 = pm.Normal('noise0', mu=0.0, sigma = 0.05)

Now the trouble begins, I know I need to define a likelihood function that accounts for the censored values and take all the parameters as inputs to output the likelihood. I think for the censored values I have to find the equation to describe P(Y > y). Typically, I could use the CDF but in this case I've found using Matlab and Mathematica the indefinite integral does not exist. What should I do?

import pymc3 as pm from surpyval import Weibull # Create 100 random variables with alpha=50 and beta=2 rvs = Weibull.random(100, 50, 2) # Set all values above 60 to be 60.. # i.e all above 60 are censored rvs[rvs >= 60] = 60 # separate censored and observed censored = rvs[rvs == 60] failures = rvs[rvs < 60] with pm.Model() as survival_model: alpha = pm.Normal('alpha', mu=100, sigma = 10) beta = pm.Gamma('beta', alpha=5, beta = 2) # Where the magic happens def logp(failures, censored): # likelihood for observed is the log of the density function failed_log_like = Weibull.log_df(failures, alpha, beta) # The log-likelihood for censored observations is the negative # of the cumulative hazard function. censored_log_like = -Weibull.Hf(censored, alpha, beta) # Return the sum of all of it return censored_log_like.sum() + failed_log_like.sum() weibull_neg_ll = pm.DensityDist('weibull_neg_ll', logp, observed={'failures' : failures, 'censored' : censored}) start = pm.find_MAP() step = pm.NUTS() trace = pm.sample(10000, step, start, random_seed=123, progressbar=True) pm.traceplot(trace)

Recommended topics

Hot tags